我用
pytest
编写了 Python 测试。这些测试下载测试数据并将其缓存为本地写入文件。
现在我正在与
pytest-xdist
并行测试。如何防止测试装置中的并行写入,因为这会导致数据损坏和测试失败?
理想情况下,只有一个测试进程需要下载数据并将其缓存为文件。
您可以使用 filelock 库为测试装置或测试中发生的每次下载创建一个锁定文件。
这是一个示例函数
wait_other_writers()
,它可以实现上述目标:
@contextmanager
def wait_other_writers(path: Path | str, timeout=120):
"""Wait other potential writers writing the same file.
- Work around issues when parallel unit tests and such
try to write the same file
Example:
.. code-block:: python
import urllib
import tempfile
import pytest
import pandas as pd
@pytest.fixture()
def my_cached_test_data_frame() -> pd.DataFrame:
# Al tests use a cached dataset stored in the /tmp directory
path = os.path.join(tempfile.gettempdir(), "my_shared_data.parquet")
with wait_other_writers(path):
# Read result from the previous writer
if not path.exists():
# Download and write to cache
urllib.request.urlretrieve("https://example.com", path)
return pd.read_parquet(path)
:param path:
File that is being written
:param timeout:
How many seconds wait to acquire the lock file.
Default 2 minutes.
"""
if type(path) == str:
path = Path(path)
assert isinstance(path, Path), f"Not Path object: {path}"
assert path.is_absolute(), f"Did not get an absolute path: {path}\n" \
f"Please use absolute paths for lock files to prevent polluting the local working directory."
# If we are writing to a new temp folder, create any parent paths
os.makedirs(path.parent, exist_ok=True)
# https://stackoverflow.com/a/60281933/315168
lock_file = path.parent / (path.name + '.lock')
lock = FileLock(lock_file, timeout=timeout)
with lock:
yield
有关此功能的示例使用,请参阅此处。