统计每个单词的出现频率

Question

有一个包含一些文本文件的目录。如何统计每个文件中每个单词的频率？单词是指一组字符，可以包含字母、数字和下划线字符。

Answer 1

这是一个应该计算文件中所有单词频率的解决方案：

    private void countWordsInFile(string file, Dictionary<string, int> words)
    {
        var content = File.ReadAllText(file);

        var wordPattern = new Regex(@"\w+");

        foreach (Match match in wordPattern.Matches(content))
        {
            int currentCount=0;
            words.TryGetValue(match.Value, out currentCount);

            currentCount++;
            words[match.Value] = currentCount;
        }
    }

您可以这样调用此代码：

        var words = new Dictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);

        countWordsInFile("file1.txt", words);

在此之后的单词将包含文件中的所有单词及其频率（例如，

words["test"]

返回“test”在文件内容中出现的次数。如果您需要累积多个文件的结果，只需调用方法用于具有相同字典的所有文件。如果您需要为每个文件提供单独的结果，则每次创建一个新字典并使用@DarkGray建议的结构。

Answer 2

有一个类似 Linq 的替代方案，我认为它更简单。这里的关键是使用

File.ReadLines

（这是懒惰阅读，很酷）和

string.Split

内置的框架。

private Dictionary<string, int> GetWordFrequency(string file)
{
    return File.ReadLines(file)
               .SelectMany(x => x.Split())
               .Where(x => x != string.Empty)
               .GroupBy(x => x)
               .ToDictionary(x => x.Key, x => x.Count());
}

要从许多文件中获取频率，您可以基于

params

进行过载。

private Dictionary<string, int> GetWordFrequency(params string[] files)
{
    return files.SelectMany(x => File.ReadLines(x))
                .SelectMany(x => x.Split())
                .Where(x => x != string.Empty)
                .GroupBy(x => x)
                .ToDictionary(x => x.Key, x => x.Count());
}

Answer 3

string input= File.ReadAllText(filename);
var arr = input.Split(' ');
// finding frequencies of words in a string
IDictionary<string, int> dict = new Dictionary<string, int>();
foreach (var item in arr)
{
    var count = 0;
    if (dict.TryGetValue(item, out count))
        dict[item] = ++a;
    else
        dict.Add(item, 1);
}

Answer 4

字数统计：

int WordCount(string text)
{
  var regex = new System.Text.RegularExpressions.Regex(@"\w+");

  var matches = regex.Matches(text);
  return matches.Count;     
}

从文件中读取文本：

string text = File.ReadAllText(filename);

字数统计结构：

class FileWordInfo
{
  public Dictionary<string, int> WordCounts = new Dictionary<string, int>();
}

List<FileWordInfo> fileInfos = new List<FileWordInfo>();

Answer 5

@aKzenT 答案很好，但是有问题！他的代码从不检查该单词是否已存在于字典中！所以我修改了代码如下：

private void countWordsInFile(string file, Dictionary<string, int> words)
{
    var content = File.ReadAllText(file);

    var wordPattern = new Regex(@"\w+");

    foreach (Match match in wordPattern.Matches(content))
    {
        if (!words.ContainsKey(match.Value))
            words.Add(match.Value, 1);
        else
            words[match.Value]++;
    }
}

Answer 6

在 .Net 9 中，您可以使用 LINQ 执行以下操作。

var wordFrequencies = text.Split()
                          .CountBy(word => word)
                          .OrderByDescending(freq => freq.Value);

foreach (var word in wordFrequencies)
{
    Console.WriteLine($"{word.Key}: {word.Value}");
}

统计每个单词的出现频率

问题描述投票：0回答：6

6个回答

最新问题

统计每个单词的出现频率

问题描述 投票：0回答：6

6个回答

最新问题

问题描述投票：0回答：6