所以关键是要找到在3个连续日期输入的人。我的框架看起来像这样:
DateEntry Person
1 2018-03-18 A
2 2018-03-19 A
3 2018-03-21 A
4 2018-09-25 B
5 2018-09-26 B
6 2018-09-27 B
我知道如何检查的唯一方法是将日期更改为字符串列表,然后检查。它工作正常,但是不允许使用此方法。
是否有一种方法可以通过熊猫遍历数据框上的行以找到答案?
我只希望显示以下输出。我不需要将结果保存在数据框中。预期输出:
Person A did not enter on 3 consecutive days.
Person B did enter on three consecutive days.
Consecutive days entered by person B:
2018-09-25
2018-09-26
2018-09-27
如果对日期时间进行了排序,并且仅在strides中的自定义功能中使用了连续3天的日期使用GroupBy.apply
,则仅检查预期输出:
GroupBy.apply
如果还需要日期时间:
df['DateEntry'] = pd.to_datetime(df['DateEntry'])
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def f(x):
vals = rolling_window(x.to_numpy(), 3)
dif = np.diff(vals, axis=1).astype("timedelta64[D]")==np.array([1], dtype='timedelta64[D]')
#print (dif)
return dif.all(axis=1).any()
s = df.groupby('Person')['DateEntry'].apply(f)
print (s)
Person
A False
B True
Name: DateEntry, dtype: bool
print (df)
DateEntry Person
1 2018-03-18 A
2 2018-03-19 A
3 2018-03-21 A
4 2018-08-25 B
5 2018-08-26 B
6 2018-08-27 B
7 2018-09-25 B
8 2018-09-26 B
9 2018-09-27 B
10 2018-09-30 B
df['DateEntry'] = pd.to_datetime(df['DateEntry'])
df = df.sort_values(['Person','DateEntry'])
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def f(x):
vals = rolling_window(x.to_numpy(), 3)
dif = np.diff(vals, axis=1).astype("timedelta64[D]")==np.array([1], dtype='timedelta64[D]')
return pd.DataFrame(vals[dif.all(axis=1)])
尝试以下代码:
df1 = df.groupby('Person')['DateEntry'].apply(f)
print (df1)
0 1 2
Person
B 0 2018-08-25 2018-08-26 2018-08-27
1 2018-09-25 2018-09-26 2018-09-27