Tesseract.js OCR 如何正确设置页面分割模式(PSM、pageseg)以检测图像中的单个数字

问题描述 投票:0回答:1

我一直在使用 tesseract 读取各种数字(最多 99,999.9),格式如下:

OCR 失败的图像示例:

似乎大约 80% 的时间都能正确读取,但我需要 95% 的准确度。

async function runOCR(url) {
    const worker = await Tesseract.createWorker('eng', 1, {
        tessedit_pageseg_mode: 13,
        config: '--psm 13'
    });

    (async () => {
        await worker.load();
        await worker.loadLanguage('eng');
        await worker.initialize('eng');    
        
        await worker.setParameters({
            tessedit_ocr_engine_mode: Tesseract.OEM_TESSERACT_ONLY,
            tessedit_char_whitelist: '0123456789,.',
            preserve_interword_spaces: '0',
            SINGLE_WORD: true,
            tessedit_pageseg_mode: Tesseract.SINGLE_WORD,
        });
        const {
            data: { text },
        } = await worker.recognize(url);
        doSomething(text);
        await worker.terminate();
    })();
}

主要问题是我不知道在哪里设置页面分割模式(PSM,pageseg)。我找到的示例要么已过时,要么采用其他语言。

这是我从C文件中找到的pageseg选项列表(https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L163

  PSM_OSD_ONLY,       ///< Orientation and script detection only.
  PSM_AUTO_OSD,       ///< Automatic page segmentation with orientation and
                      ///< script detection. (OSD)
  PSM_AUTO_ONLY,      ///< Automatic page segmentation, but no OSD, or OCR.
  PSM_AUTO,           ///< Fully automatic page segmentation, but no OSD.
  PSM_SINGLE_COLUMN,  ///< Assume a single column of text of variable sizes.
  PSM_SINGLE_BLOCK_VERT_TEXT,  ///< Assume a single uniform block of vertically
                               ///< aligned text.
  PSM_SINGLE_BLOCK,   ///< Assume a single uniform block of text. (Default.)
  PSM_SINGLE_LINE,    ///< Treat the image as a single text line.
  PSM_SINGLE_WORD,    ///< Treat the image as a single word.
  PSM_CIRCLE_WORD,    ///< Treat the image as a single word in a circle.
  PSM_SINGLE_CHAR,    ///< Treat the image as a single character.
  PSM_SPARSE_TEXT,    ///< Find as much text as possible in no particular order.
  PSM_SPARSE_TEXT_OSD,  ///< Sparse text with orientation and script det.
  PSM_RAW_LINE,       ///< Treat the image as a single text line, bypassing
                      ///< hacks that are Tesseract-specific.

如何更好地检测图像中的数字或如何正确设置页面分割模式/配置? (我所做的配置更改似乎对我的命中率没有影响)

javascript ocr tesseract python-tesseract tesseract.js
1个回答
0
投票

我在

tessedit_pageseg_mode: 13,
中看到
createWorker
,然后在
tessedit_pageseg_mode: Tesseract.SINGLE_WORD
中看到
worker.setParameters

您只需在调用 recognize 函数之前设置此参数(
页面分割模式
)一次。

要检测图像中的单个数字(例如您提供的图像),您应该使用

PSM_SINGLE_LINE
PSM_SINGLE_WORD
,它们似乎专门针对此类任务进行了优化。

async function runOCR(url) {
    const worker = await Tesseract.createWorker({
        logger: m => console.log(m)
    });

    await worker.load();
    await worker.loadLanguage('eng');
    await worker.initialize('eng');

    // Set only the necessary parameters once
    await worker.setParameters({
        tessedit_ocr_engine_mode: Tesseract.OEM_TESSERACT_ONLY,
        tessedit_char_whitelist: '0123456789.,',
        tessedit_pageseg_mode: Tesseract.PSM_SINGLE_LINE // or PSM_SINGLE_WORD if a line does not work well
    });

    // Now recognize the number in the image
    const { data: { text } } = await worker.recognize(url);
    doSomething(text);

    await worker.terminate();
}
© www.soinside.com 2019 - 2024. All rights reserved.