我正在使用transform
的Series
功能,但有些让我困惑的事情。
我搜索了熊猫的文档并用Google搜索,但找不到答案。
当我使用np.sum
时,结果是:
s = Series(range(7))
s.transform(lambda x:x + np.sum(x))
0 2
1 4
2 6
3 8
4 10
5 12
6 14
Name: A, dtype: int64
所以,我认为x
是Series
的元素。但是当我使用x.sum
时,结果是:
s.transform(lambda x:x + x.sum())
0 29
1 30
2 31
3 32
4 33
5 34
6 35
Name: A, dtype: int64
x
看起来像一个系列。当s
是一个数据帧时,它将得到相同的结果。
我很迷惑。谁能帮我回答我的问题,非常感谢。
我读了源代码。我发现transform
函数依赖于aggregate
函数。它将首先尝试常规应用:
def aggregate(self, func, axis=0, *args, **kwargs):
# Validate the axis parameter
self._get_axis_number(axis)
result, how = self._aggregate(func, *args, **kwargs)
if result is None:
# we can be called from an inner function which
# passes this meta-data
kwargs.pop('_axis', None)
kwargs.pop('_level', None)
# try a regular apply, this evaluates lambdas
# row-by-row; however if the lambda is expected a Series
# expression, e.g.: lambda x: x-x.quantile(0.25)
# this will fail, so we can try a vectorized evaluation
# we cannot FIRST try the vectorized evaluation, because
# then .agg and .apply would have different semantics if the
# operation is actually defined on the Series, e.g. str
try:
result = self.apply(func, *args, **kwargs)
except (ValueError, AttributeError, TypeError):
result = func(self, *args, **kwargs)
return result
所以,首先它会将标量传递给用户定义的函数。 transform
函数将调用s.apply(lambda x: x + x.sum())
,它将引发AttributeError
,然后它将整个系列传递给用户定义的函数。例如:
def func(x):
print(type(x))
print(x)
return x + x.sum()
s.transform(func)
<class 'int'>
1
<class 'pandas.core.series.Series'>
0 1
1 2
2 3
3 4
4 5
5 6
6 7
Name: A, dtype: int64
0 29
1 30
2 31
3 32
4 33
5 34
6 35
Name: A, dtype: int64