Python:如何使用生成器来避免sql内存问题

问题描述 投票:4回答:1

我有以下访问mysql数据库的方法,并且查询在服务器上执行,但我无权更改有关增加内存的任何内容。我是生成器新手,开始阅读有关它的更多信息,并认为我可以将其转换为使用生成器。

def getUNames(self):
    globalUserQuery = ur'''SELECT gu_name FROM globaluser WHERE gu_locked = 0'''
    global_user_list = []
    try:
        self.gdbCursor.execute(globalUserQuery)
        rows = self.gdbCursor.fetchall()
        for row in rows:
            uName = unicode(row['gu_name'], 'utf-8')
            global_user_list.append(uName)
        return global_user_list
    except Exception, e:
        traceback.print_exc()

并且我使用以下代码:

for user_name in getUNames():
...

这是我从服务器端收到的错误:

^GOut of memory (Needed 725528 bytes)
Traceback (most recent call last):
...
packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
OperationalError: (2008, 'MySQL client ran out of memory')

我应该如何使用生成器来避免这种情况:

while true:
   self.gdbCursor.execute(globalUserQuery)
   row = self.gdbCursor.fetchone()
   if row is None: break
   yield row

不确定上述方法是否正确,因为我希望数据库方法可以产生一个列表。我认为最好的方法是从查询中获取一个块并返回一个列表,一旦完成,生成器将提供下一个集合,只要查询返回结果即可。

python mysql yield
1个回答
10
投票

[对于MySQLdb,当调用cursor.execute(..)时,默认光标将整个结果集加载到Python列表中。对于可能导致MemoryError的大型查询,无论您是否使用生成器。

相反,请使用SSCursor或SSDictCursor。这些将结果集保留在服务器端,并允许您遍历客户端结果集中的项目:

import MySQLdb  
import MySQLdb.cursors as cursors
import traceback

def getUNames(self):
    # You may of course want to define `self.gdbCursor` somewhere else...
    conn = MySQLdb.connect(..., cursorclass=cursors.SSCursor)
    #                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    #                       Set the cursor class to SSCursor here
    self.gdbCursor = conn.cursor()

    globalUserQuery = ur'''SELECT gu_name FROM globaluser WHERE gu_locked = 0'''
    try:
        self.gdbCursor.execute(globalUserQuery)
        for row in self.gdbCursor:
            uName = unicode(row['gu_name'], 'utf-8')
            yield uName
    except Exception as e:
        traceback.print_exc()

关于默认值CursorSSCursor之间的区别,没有太多文档。我知道的最好的来源是Cursor Mixin类本身的文档字符串:

默认光标使用CursorStoreResultMixIn

In [2]: import MySQLdb.cursors as cursors
In [8]: print(cursors.CursorStoreResultMixIn.__doc__)
This is a MixIn class which causes the entire result set to be
    stored on the client side, i.e. it uses mysql_store_result(). If the
    result set can be very large, consider adding a LIMIT clause to your
    query, or using CursorUseResultMixIn instead.

并且SSCursor使用CursorUseResultMixIn

In [9]: print(cursors.CursorUseResultMixIn.__doc__)
This is a MixIn class which causes the result set to be stored
    in the server and sent row-by-row to client side, i.e. it uses
    mysql_use_result(). You MUST retrieve the entire result set and
    close() the cursor before additional queries can be peformed on
    the connection.

自从我将getUNames更改为生成器后,就可以这样使用它:

for row in self.getUnames():
    ...
© www.soinside.com 2019 - 2024. All rights reserved.