打印txt文件中单词出现的次数

问题描述 投票:0回答:2

我正在尝试查找单词“ the”在txt文件中出现的次数。在下面的代码中,当应为4520时,我一直将0用作输出。我使用定界符分隔“ the”,但似乎根本没有将其计算在内。当我使用"[^a-zA-Z]+"计算所有单词时,定界符起作用。

in.useDelimiter("[^the]+");
while (in.hasNext()) {
    String words = in.next();
    words = words.toLowerCase();
    wordCount++;
}
System.out.println("The total number of 'the' is " + theWord);
java java.util.scanner
2个回答
2
投票

在Java 9+中,您可以按如下方式计算单词在文本文件中出现的次数:

static long countWord(String filename, String word) throws IOException {
    Pattern p = Pattern.compile("\\b" + Pattern.quote(word) + "\\b", Pattern.CASE_INSENSITIVE);
    return Files.lines(Paths.get(filename)).flatMap(s -> p.matcher(s).results()).count();
}

Test

System.out.println(countWord("test.txt", "the"));

test.txt

The quick brown fox
jumps over the lazy dog

输出

2

Java 8版本:

static int countWord(String filename, String word) throws IOException {
    Pattern p = Pattern.compile("\\b" + Pattern.quote(word) + "\\b", Pattern.CASE_INSENSITIVE);
    return Files.lines(Paths.get(filename)).mapToInt(s -> {
        int count = 0;
        for (Matcher m = p.matcher(s); m.find(); )
            count++;
        return count;
    }).sum();
}

Java 7版本:

static int countWord(String filename, String word) throws IOException {
    Pattern p = Pattern.compile("\\b" + Pattern.quote(word) + "\\b", Pattern.CASE_INSENSITIVE);
    int count = 0;
    try (BufferedReader in = Files.newBufferedReader(Paths.get(filename), StandardCharsets.UTF_8)) {
        for (String line; (line = in.readLine()) != null; )
            for (Matcher m = p.matcher(line); m.find(); )
                count++;
    }
    return count;
}

1
投票

\\b(?i)(the)\\b用作正则表达式,其中\\b表示单词边界,i表示不区分大小写,(the)总体上表示the。请注意,[]将检查由其括起来的单个字符,而不是整个被括住的文本。

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Main {
    public static void main(String[] args) {
        Scanner in;
        try {
            in = new Scanner(new File("file.txt"));
            int wordCount = 0;
            in.useDelimiter("\\b(?i)(the)\\b");
            while (in.hasNext()) {
                String words = in.next();
                // words = words.toLowerCase();
                wordCount++;
            }
            System.out.println("The total number of 'the' is " + wordCount);
        } catch (FileNotFoundException e) {
            System.out.println("File does not exist");
        }
    }
}

输出:

The total number of 'the' is 5

file.txt的内容:

The cat jumped over the rat.
The is written as THE in capital letter.
He gave them the sword.
© www.soinside.com 2019 - 2024. All rights reserved.