Python:来自另一个列表的dict中的出现次数

问题描述 投票:0回答:3

我试图根据感兴趣的单词子集计算单词列中单词存在的次数。

首先我导入我的数据

products = graphlab.SFrame('amazon_baby.gl/')
products['word_count'] = graphlab.text_analytics.count_words(products['review'])
products.head(5)

数据可以在这里找到:https://drive.google.com/open?id=0BzbhZp-qIglxM3VSVWRsVFRhTWc

然后我创建我感兴趣的单词列表:

words = ['awesome', 'great', 'fantastic']

我想计算产品['word_count']中“单词”中每个单词出现的次数。

我没有结婚使用graphlab。这只是一位同事向我建议的。

python word-count graphlab sframe
3个回答
1
投票

好吧,我不太确定你在“词典栏目”中的意思。如果是列表:

import operator
dictionary={'texts':['red blue blue','red black','blue white white','red','white','black','blue red']}
words=['red','white','blue']
freqs=dict()
for t in dictionary['texts']:
    for w in words:
        try:
             freqs[w]+=t.count(w)
        except:
            freqs[w]=t.count(w)
top_words = sorted(freqs.items(), key=operator.itemgetter(1),reverse=True)

如果只是一个文字:

import operator
dictionary={'text':'red blue blue red black blue white white red white black blue red'}
words=['red','white','blue']
freqs=dict()
for w in words:
    try:
        freqs[w]+=dictionary['text'].count(w)
    except:
        freqs[w]=dictionary['text'].count(w)
top_words = sorted(freqs.items(), key=operator.itemgetter(1),reverse=True) 

1
投票

如果你想计算单词的出现次数,快速的方法是使用Countercollectionsobject

例如 :

In [3]: from collections import Counter
In [4]: c = Counter(['hello', 'world'])

In [5]: c
Out[5]: Counter({'hello': 1, 'world': 1})

你能展示你的products.head(5)命令的输出吗?


1
投票

如果您坚持使用graphlab(或SFrame),请使用SArray.dict_trim_by_keys方法。文档在这里:https://dato.com/products/create/docs/generated/graphlab.SArray.dict_trim_by_keys.html

import graphlab as gl
sf = gl.SFrame({'review': ['what a good book', 'terrible book']})
sf['word_bag'] = gl.text_analytics.count_words(sf['review'])

keywords = ['good', 'book']
sf['key_words'] = sf['word_bag'].dict_trim_by_keys(keywords, exclude=False)
print sf

+------------------+---------------------+---------------------+
|      review      |       word_bag      |      key_words      |
+------------------+---------------------+---------------------+
| what a good book | {'a': 1, 'good':... | {'good': 1, 'boo... |
|  terrible book   | {'book': 1, 'ter... |     {'book': 1}     |
+------------------+---------------------+---------------------+ 
[2 rows x 3 columns]
© www.soinside.com 2019 - 2024. All rights reserved.