Python 中 CSV 列的词频

Question

我有一个 .csv 文件，其中包含我收集的一列消息，我希望获得该列中每个单词的词频列表。这是我到目前为止所拥有的，我不确定我在哪里犯了错误，任何帮助将不胜感激。编辑：预期的输出是将整个单词列表及其计数（不重复）写入另一个 .csv 文件。

import csv
from collections import Counter
from collections import defaultdict

output_file = 'comments_word_freqency.csv'
input_stream = open('comments.csv')
reader = csv.reader(input_stream, delimiter=',')
reader.next() #skip header
csvrow = [row[3] for row in reader] #Get the fourth column only

with open(output_file, 'rb') as csvfile:
    for row in reader:
        freq_dict = defaultdict(int) # the "int" part
                                    # means that the VALUES of the dictionary are integers.
        for line in csvrow:
            words = line.split(" ")
            for word in words:
                word = word.lower() # ignores case type
                freq_dict[word] += 1

        writer = csv.writer(open(output_file, "wb+")) # this is what lets you write the csv file.
        for key, value in freq_dict.items():
                        # this iterates through your dictionary and writes each pair as its own line.
            writer.writerow([key, value])

Answer 1

最近我运行了 SAMO 提出的代码。我在使用 Python3.6 时遇到了一些问题。因此，我发布了一个工作代码 [从 SAMO 的代码中更改了几行]，这可能会帮助其他人并节省他们的时间。

import csv
from collections import Counter
from collections import defaultdict

words= []
with open('data.csv', 'rt') as csvfile:
    reader = csv.reader(csvfile)
    next(reader)
    for col in reader:
         csv_words = col[0].split(" ")
         for word in csv_words:
              words.append(word) 

with open('frequency_result.csv',  'a+') as csvfile:
    writer = csv.writer(csvfile, delimiter=',')
    for word in words:
        word_count = words.count(word)
        words_counted.append((word,word_count))    
    writer.writerow(words_counted)

Answer 2

您上传的代码到处都是，但我认为这就是您要表达的意思。这将返回单词列表以及它在原始文件中出现的次数。

words= []
with open('comments_word_freqency.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile)
    reader.next()
    for row in reader:
         csv_words = row[3].split(" ")
         for i in csv_words:
              words.append(i)

words_counted = []
for i in words:
    x = words.count(i)
    words_counted.append((i,x))

#write this to csv file
with open('output.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(edgl)

然后要删除列表中的重复项，只需调用 set() 即可

set(words_counted)

您的输出将如下所示：

'this', 2
'is', 1
'your', 3
'output', 5

Python 中 CSV 列的词频

问题描述投票：0回答：2

2个回答

最新问题

Python 中 CSV 列的词频

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2