我正在使用
np.genfromtxt
读取 csv 文件,并尝试使用 converters
参数对每一列进行预处理。
CSV:
"","Col1","Col2","Col3"
"1","Cell.1",NA,1
"2","Cell.2",NA,NA
"3","Cell.3",1,NA
"4","Cell.4",NA,NA
"5","Cell.5",NA,NA
"6","Cell.6",1,NA
代码:
import numpy as np
filename = 'b.csv'
h = ("", "Col1", "Col2", "Col3")
def col1_converter(v):
print(f'col1_converter {v = }')
return v
def col2_converter(v):
print(f'col2_converter {v = }')
return v
def col3_converter(v):
print(f'col3_converter {v = }')
return v
a = np.genfromtxt(
filename,
delimiter=',',
names=True,
dtype=[None, np.dtype('U8'), np.dtype('U2'), np.dtype('U2')],
usecols=range(1, len(h)),
converters={1: col1_converter, 2: col2_converter, 3: col3_converter},
deletechars='',
)
print()
print(a)
当我在转换器中放置 print 语句时,我看到在开头打印了一行无关的 1,它实际上并没有出现在输出的矩阵中。为什么我会看到这一排 1?
col1_converter v = b'1'
col2_converter v = b'1'
col3_converter v = b'1'
col1_converter v = b'"Cell.1"'
col1_converter v = b'"Cell.2"'
col1_converter v = b'"Cell.3"'
col1_converter v = b'"Cell.4"'
col1_converter v = b'"Cell.5"'
col1_converter v = b'"Cell.6"'
col2_converter v = b'NA'
col2_converter v = b'NA'
col2_converter v = b'1'
col2_converter v = b'NA'
col2_converter v = b'NA'
col2_converter v = b'1'
col3_converter v = b'1'
col3_converter v = b'NA'
col3_converter v = b'NA'
col3_converter v = b'NA'
col3_converter v = b'NA'
col3_converter v = b'NA'
[('"Cell.1"', 'NA', '1') ('"Cell.2"', 'NA', 'NA') ('"Cell.3"', '1', 'NA')
('"Cell.4"', 'NA', 'NA') ('"Cell.5"', 'NA', 'NA') ('"Cell.6"', '1', 'NA')]
TL;DR: 在进行任何实际转换之前,numpy 通过使用参数
'1'
调用它来“测试”每个转换器函数,以找到该列的合理默认值。这不会影响输出,除非可能更改给定列的默认值。
我觉得很奇怪每个转换器如何被调用一次,然后为每一行调用第 1 列转换器,然后是第 2 列转换器,依此类推。这表明这些调用来自代码中的不同区域。我用python的
traceback
模块来确认:
def col1_converter(v):
print(f'col1_converter {v = }')
traceback.print_stack()
return v
果然,所有对
col1_converter
的调用都有相同的堆栈跟踪,除了第一个。我查看了堆栈跟踪并发现了这段有趣的代码:
File "/Users/rpmccarter/Library/Python/3.8/lib/python/site-packages/numpy/lib/_iotools.py", line 804, in update
tester = func(testing_value or '1')
因为 numpy 是开源的,所以我只是去 GitHub repo 去了
_iotools.py
文件。我在 here 以及转换器调用 here: 中找到了他们为何调用转换器的简要说明
testing_value : str, optional
A string representing a standard input value of the converter.
This string is used to help defining a reasonable default
value.
...
try:
tester = func(testing_value or '1')
except (TypeError, ValueError):
tester = None