我使用
pandas
来操作数据帧,并使用 logging
将中间结果以及警告和错误记录到单独的日志文件中。我还需要将一些中间数据帧打印到同一个日志文件中。具体来说,我想:
logging
消息相同的日志文件中(以确保更轻松地调试并避免编写许多中间文件,就像使用文件目标调用 to_csv
的情况一样),logging
级别(例如DEBUG
或INFO
)控制日志记录的详细程度(通常这样做),与其他日志消息的详细程度共享(包括那些与数据帧不相关的消息) ).
240102 10:58:20 INFO:
。
请附上用法示例。
示例:
import io
import logging
import pandas as pd
# Print into log this many lines of several intermediate dataframes,
# set to 20 or so:
MAX_NUM_DF_LOG_LINES = 4
logging.basicConfig(
datefmt = '%y%m%d %H:%M:%S',
format = '%(asctime)s %(levelname)s: %(message)s')
logger = logging.getLogger(__name__)
# Or logging.DEBUG, etc:
logger.setLevel(level = logging.INFO)
# Example of a simple log message:
logger.info('Reading input.')
TESTDATA="""
enzyme regions N length
AaaI all 10 238045
AaaI all 20 170393
AaaI captured 10 292735
AaaI captured 20 229824
AagI all 10 88337
AagI all 20 19144
AagI captured 10 34463
AagI captured 20 19220
"""
df = pd.read_csv(io.StringIO(TESTDATA), sep='\s+')
# ...some code....
# Example of a log message with a chunk of a dataframe, here, using
# `head` (but this can be another method that slices a dataframe):
logger.debug('less important intermediate results: df:')
for line in df.head(MAX_NUM_DF_LOG_LINES).to_string().splitlines():
logger.debug(line)
# ...more code....
logger.info('more important intermediate results: df:')
for line in df.head(MAX_NUM_DF_LOG_LINES).to_string().splitlines():
logger.info(line)
# ...more code....
打印:
240102 10:58:20 INFO: Reading input.
240102 10:58:20 INFO: more important intermediate results: df:
240102 10:58:20 INFO: enzyme regions N length
240102 10:58:20 INFO: 0 AaaI all 10 238045
240102 10:58:20 INFO: 1 AaaI all 20 170393
240102 10:58:20 INFO: 2 AaaI captured 10 292735
240102 10:58:20 INFO: 3 AaaI captured 20 229824
这些都没有完成我尝试做的事情,但它已经越来越接近了:
print
“请注意,由于映射是惰性的,后者仅适用于py2;你可以在 py3 上执行
[logger.info(line) for line in 'line 1\nline 2\nline 3'.splitlines()]
。 –
九八,2021 年 6 月 22 日 16:30"。
此外,240102 12:27:19 INFO: dataframe head - enzyme regions N length
0 AaaI all 10 238045
1 AaaI all 20 170393
2 AaaI captured 10 292735
...
def log_df(level, df, n_rows, header):
if isinstance(level, str):
level = getattr(logging, level)
logger.log(level, header)
for line in df.head(n_rows).to_string().splitlines():
logger.log(level, line)
log_df("INFO", df, MAX_NUM_DF_LOG_LINES, 'more important intermediate results: df:')