使用pyodbc执行许多数据到SQL Server的数据帧

Question

我正在尝试使用Pyodbc将数据从数据帧中的数据加载到SQL Server，该方法逐行插入并且速度非常慢。

我尝试了两种在线（中等）的方法，但性能没有任何改善。

试图在SQL Azure中运行，因此SQL Alchemy并不是一种简单的连接方法。请找到我遵循的方法，还有其他方法可以改善批量加载的性能。

方法1

 cursor = sql_con.cursor()
cursor.fast_executemany = True
for row_count in range(0, df.shape[0]):
  chunk = df.iloc[row_count:row_count + 1,:].values.tolist()
  tuple_of_tuples = tuple(tuple(x) for x in chunk)
  for index,row in ProductInventory.iterrows():
  cursor.executemany("INSERT INTO table ([x]],[Y]) values (?,?)",tuple_of_tuples)

方法2

 cursor = sql_con.cursor() 
for row_count in range(0, ProductInventory.shape[0]):
      chunk = ProductInventory.iloc[row_count:row_count + 1,:].values.tolist()
      tuple_of_tuples = tuple(tuple(x) for x in chunk)
  for index,row in ProductInventory.iterrows():
    cursor.executemany(""INSERT INTO table ([x]],[Y]) values (?,?)",tuple_of_tuples

谁能告诉我为什么性能甚至不能提高1％？仍然需要相同的时间

Answer 1

几件事

您为什么要遍历ProductInventory两次？
executemany调用不会在建立整个tuple_of_tuples或一批tuple_of_tuples之后发生？
pyodbc文档说，“使用fast_executemany = False运行executemany（）通常不会比直接运行多个execute（）命令快得多”。因此，您需要在两个示例中都设置cursor.fast_executemany=True（有关更多详细信息/示例，请参见https://github.com/mkleehammer/pyodbc/wiki/Cursor）。我不确定为什么在示例2中将其省略。

这里是一个示例，说明如何完成我认为您要尝试做的事情。 math.ceil和end_idx = ...中的条件表达式占最后一批，可能是奇数大小。因此，在下面的示例中，您有10行，批处理大小为3，因此最终有4个批处理，最后一个只有1个元组。

import math

df = ProductInventory
batch_size = 500
num_batches = math.ceil(len(df)/batch_size)

for i in range(num_batches):
    start_idx = i * batch_size
    end_idx = len(df) if i + 1 == num_batches else start_idx + batch_size
    tuple_of_tuples = tuple(tuple(x) for x in df.iloc[start_idx:end_idx, :].values.tolist())       
    cursor.executemany("INSERT INTO table ([x]],[Y]) values (?,?)", tuple_of_tuples)

示例输出：

=== Executing: ===
df = pd.DataFrame({'a': range(1,11), 'b': range(101,111)})

batch_size = 3
num_batches = math.ceil(len(df)/batch_size)

for i in range(num_batches):
    start_idx = i * batch_size
    end_idx = len(df) if i + 1 == num_batches else start_idx + batch_size
    tuple_of_tuples = tuple(tuple(x) for x in df.iloc[start_idx:end_idx, :].values.tolist())
    print(tuple_of_tuples)

=== Output: ===
((1, 101), (2, 102), (3, 103))
((4, 104), (5, 105), (6, 106))
((7, 107), (8, 108), (9, 109))
((10, 110),)

使用pyodbc执行许多数据到SQL Server的数据帧

问题描述投票：0回答：1

1个回答

最新问题

使用pyodbc执行许多数据到SQL Server的数据帧

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1