我有一个带有〜25k节点和125k边的NetworkX图(g
)。我想使用memcached缓存g
,但g
太大。我最多能够将每个项目的内存缓存限制增加到32MB,这是不行的。
我什至应该尝试使它与memcached一起使用吗?
还有其他选择,如果我想能够存储最多1m节点和10m边的networkx图,]]] >><<
- 我正在使用python。随附用于创建图形的示例代码。
import sys
import pickle
import random
import networkx as nx
from django.core.cache import cache
def randstring(x=3):
return ''.join([chr(random.randrange(65, 91)) for _ in range(x)])
class Qux(object):
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
for n, v in {1: 500, 2: 5000, 3: 50000}.items():
g = nx.Graph()
nodes = [Qux(randstring(), randstring()) for _ in range(v)]
g.add_nodes_from(nodes)
for node in g.nodes:
num = random.randrange(25)
edges = [(node, random.choice(nodes)) for _ in range(num)]
g.add_edges_from(edges)
print len(g.nodes), sys.getsizeof(pickle.dumps(g))
cache.set('{}/graph'.format(n), g, 3600)
Memcached控制台输出(memcached -I 32M -vv
)
<20 new auto-negotiating client connection
20: Client using the ascii protocol
<20 set :1:1/graph 1 3600 130678
>20 STORED
<20 delete :1:2/graph
>20 NOT_FOUND
<20 delete :1:3/graph
>20 NOT_FOUND
我有一个NetworkX图(g),其中有〜25k节点和125k边。我想使用memcached缓存g,但是g太大。我最多可以将每个项目的内存缓存限制增加到32MB,这是不行的...
[如果您查看NetworkX的代码,则图表仅是python字典。如果您唯一需要的功能是访问节点和边,则可以使用字典构建图形并将文件转换为JSON,从而可以缓存图形。用于将边缘添加到无向加权图的代码基本上是这样的:
G_New = {}
for edge in edges:
try:
G_New[edge['node1'].update({edge['node2']: edge['weight']})
except KeyError:
G_New[edge['node1']] = {edge['node2']: edge['weight']}
try:
G_New[edge['node2']].update({edge['node1']: edge['weight']})
except KeyError:
G_New[edge['node2']] = {edge['node1']: edge['weight']}
之后是简单的json.dumps(G_New)
。对于较大的图,您可以将字典拆分为较小的组件,并将每个组件托管在内存缓存中。因此How to split dictionary into multiple dictionaries fast
G_New = {}
for edge in edges:
try:
G_New[edge['node1'].update({edge['node2']: edge['weight']})
except KeyError:
G_New[edge['node1']] = {edge['node2']: edge['weight']}
try:
G_New[edge['node2']].update({edge['node1']: edge['weight']})
except KeyError:
G_New[edge['node2']] = {edge['node1']: edge['weight']}