同一SQL Server临时表是否可以持久保存并由多个独立执行的Python脚本使用，而无需每次都重新创建？

Question

我正在从SQL Server数据库中获取数据并将其保存到文件中，以便随后在Python中进行处理。

我正在使用Make自动执行数据的获取和重新获取（如果某些设置发生更改，则仅重新运行查询的受影响部分，而不是全部运行这些查询）。所以我有一个简单的Makefile如下：

rawdata: datafile1.h5, datafile2.h5 # ... more files like this
datafile1.h5: data1_specific_config.py, common_config.py
    python fetch_data1.py
datafile2.h5: data2_specific_config.py, common_config.py
    python fetch_data2.py
# ... similar rules for other files

并且在需要时我只运行make rawdata。

现在由脚本fetch_dataN.py执行的所有SQL查询都具有重要的公共部分。 queryN运行的fetch_dataN.py示意图如下：

select ... into ##common_tmp_table ... /*this is identical for all queries*/
select ... from (... ##common_tmp_table ...) /*this is queryN specific; but the same ##common_tmp_table is used*/

[这里是问题：当我现在需要重建五个不同的数据文件的情况下运行make rawdata时，相同的查询select ... into ##common_tmp_table ...将运行五次，并且将相同的输出输出到##common_tmp_table中。该查询需要很长时间才能运行，因此重新执行该查询五次会大大降低所有速度。

但是当一个脚本fetch_dataN.py完成时，临时表总是被删除，因为创建它的数据库连接已终止。

问题：

有没有一种方法可以迫使表##common_tmp_table仅创建一次并在由fetch_dataN.py执行的所有脚本make rawdata之间持久化？

尤其是，有一种方法可以在make rawdata运行的所有脚本中使用相同的数据库连接吗？还是打开一个额外的连接，该连接将在所有脚本运行时持续存在，并且将阻止删除全局临时表？

我知道的解决方法：我可以通过在运行##common_tmp_table之前手动创建make rawdata（例如在MS SQL Server Management Studio中）并保持用于此连接的连接打开，直到所有脚本完成来解决此问题。但这显然很丑陋和令人讨厌。

如果make rawdata可以打开一个单独的进程来打开连接，请创建tmp表，并一直等到其他所有操作完成，这将是一个解决方案。但是我不知道这是否可能。

限制：

我无法在数据库中进行更改（例如，创建永久表而不是临时表）
我需要脚本保持分开，以便它们可以独立执行（在一个脚本中，所有文件都具有相同的数据库连接，因此，相同的tmp表将无济于事-每当其中一个或两个文件都重建时，所有数据文件都会重新生成需要重新获取会更慢）

注意：

MS SQL Server 2008 R2
[pyodbc 4.0.28（用于连接到数据库）
python 3.7.6
make 4.3
conda 4.7.12

谢谢。

Answer 1

所以我找到了一个很好用的解决方案：这个想法是让make rawdata执行一个python脚本，它>

打开数据库连接并保持打开状态
创建##common_tmp_table
运行make rawdata_，它负责重建数据文件（与问题中发布的代码中的make rawdata相同，但现在查询中没有select ... into ##common_tmp_table ...）
关闭连接

使用代码：

Makefile：

#THIS IS NEW .PHONY rawdata # to always rebuild rawdata target rawdata: python fetch_all_non_uptodate.py # just call a script that (among other stuff) runs `make rawdata_` #THE REST IS AS BEFORE (just added underscore) rawdata_: datafile1.h5, datafile2.h5 # ... more files like this datafile1.h5: data1_specific_config.py, common_config.py python fetch_data1.py datafile2.h5: data2_specific_config.py, common_config.py python fetch_data2.py # ... similar rules for other files

fetch_all_non_uptodate.py：

import subprocess
import pyodbc

conn = pyodbc.connect(...) #open db connection

# simulate the run of make with the -q flag to find out whether all the datafiles are up-to-date (return code 0) or not (return code 1); nothing is re-fetched as yet
uptodate = (subprocess.run(['make', '-q', 'rawdata_']).returncode == 0)

# if the raw datafiles are not up-to-date
if not uptodate:    
    create_common_tmp_table(conn) # create the ##common_tmp_table in the db and keep it while conn is open
    conn.commit() #commit the creation of the tmp table (Important! - otherwise the other connections won't see it!)
    subprocess.run(['make', 'rawdata_']) # run make to re-fetch whatever datafiles need to be re-fetched
                                         # the queries can make use of the existing tmp table
# otherwise we just simulate the make output telling that all is up-to-date
else:
    print("make: Nothing to be done for 'rawdata'.")

conn.close()
queryN：

/*keep just the specific part - the ##common_tmp_table already exists*/
select ... from (... ##common_tmp_table ...)

同一SQL Server临时表是否可以持久保存并由多个独立执行的Python脚本使用，而无需每次都重新创建？

问题描述投票：0回答：1

1个回答

最新问题

同一SQL Server临时表是否可以持久保存并由多个独立执行的Python脚本使用，而无需每次都重新创建？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1