ITERTUPLES 是迭代 pandas DF 的好方法,它返回一个命名元组。
import pandas as pd
import numpy as np
df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},index=['dog', 'hawk'])
for row in df.itertuples():
print(type(row))
print(row)
<class 'pandas.core.frame.Pandas'>
Pandas(Index='dog', num_legs=4, num_wings=0)
<class 'pandas.core.frame.Pandas'>
Pandas(Index='hawk', num_legs=2, num_wings=2)
向返回的命名元组添加类型提示的正确方法是什么?
我认为这是不可能的,因为您的数据帧可以具有任何任意数据类型,因此元组将具有数据帧中存在的任何任意数据类型。同样,您不能使用 Python 类型提示来指定 DataFrame 的列类型,您也不能显式键入那些命名的元组。
如果您在进入 for 循环之前需要列的类型信息,您当然可以使用
df.dtypes
,它为您提供带有列类型的 Series。
如果列名和数据类型是固定的,一种可能的解决方案是将 df 行的数据结构显式声明为 NamedTuple:
import pandas as pd
Row = NamedTuple(
"Row",
[("num_legs", int), ("num_wings", int), ("index", str)],
)
df = pd.DataFrame(
{"num_legs": [4, 2], "num_wings": [0, 2]}, index=["dog", "hawk"]
)
row: Row
for row in df.itertuples():
row.num_legs
import pandas as pd
import numpy as np
df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},index=['dog', 'hawk'])
for row in df.itertuples():
print(type(row))
print(row)
您会注意到类型是
pandas.core.frame.Pandas
——但这会进行错误类型检查。您需要输入 check pd.core.frame.pandas
import pandas as pd
import numpy as np
def test_chk(row2chk: pd.core.frame.pandas):
print(row2chk)
print(row2chk.num_legs) # prints the value in the num_legs column
df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},index=['dog', 'hawk'])
for row in df.itertuples():
print(type(row))
test_chk(row)
这是 Bravhek 的答案的稍微修改版本,带有实际的类型检查:
from typing import NamedTuple
import pandas as pd
from typing import get_type_hints
Row = NamedTuple(
"Animal",
[("Index", str), ("num_legs", int), ("num_wings", int)],
)
df = pd.DataFrame(
{"num_legs": [4, 2, 'a'], "num_wings": [0, 2, 3]}, index=["dog", "hawk", "bad_record"]
)
# Just a protocol type hint:
row: Row
for row in df.itertuples():
print(row.num_legs)
# Actual type checking:
if set(Row._fields) != set(df.columns.tolist()) | {'Index'}:
print('columns mismatch')
for row in df.itertuples():
for fn in Row._fields:
if not isinstance(getattr(row,fn), get_type_hints(Row)[fn]):
print('type mismatch in column "{}", row "{}"'.format(fn, row))
print(row.num_legs)
它打印以下内容:
4
2
a
4
2
type mismatch in column "num_legs", row "Pandas(Index='bad_record', num_legs='a', num_wings=3)"
a
protocol type hint
部分对于消除IDE警告很有用(例如PyCharm“未解析的属性引用”),但它不会验证任何内容。