数到熊猫

问题描述 投票:-2回答:1

我想从以下方面获得一个df:

df = pd.DataFrame({'Start Time': ['27/02/2018 12:56', '27/02/2018 12:56', '27/02/2018 12:51', '28/02/2018 12:51', '28/02/2018 12:46', '28/02/2018 12:46', '28/02/2018 12:41', '28/02/2018 12:41', '01/03/2018 12:36', '01/03/2018 12:36', '01/03/2018 12:31', '01/03/2018 12:31', '02/03/2018 12:27', '02/03/2018 12:27', '02/03/2018 12:27', '02/03/2018 12:27'], 'Event_type': ['Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer'], 'Status': ['S', 'S', 'S', 'S', 'F', 'S', 'F', 'S', 'F', 'S', 'S', 'F', 'S', 'S', 'F', 'F'], 'Job Number': [1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0]}, columns=['Job Number','Start Time','Event_type','Status'])

print (df)
      Job Number        Start Time Event_type Status
0   1.000000e+12  27/02/2018 12:56   Transfer      S
1   1.000000e+12  27/02/2018 12:56   Transfer      S
2   1.000000e+12  27/02/2018 12:51   Transfer      S
3   1.000000e+12  28/02/2018 12:51   Transfer      S
4   1.000000e+12  28/02/2018 12:46   Transfer      F
5   1.000000e+12  28/02/2018 12:46   Transfer      S
6   1.000000e+12  28/02/2018 12:41   Transfer      F
7   1.000000e+12  28/02/2018 12:41   Transfer      S
8   1.000000e+12  01/03/2018 12:36   Transfer      F
9   1.000000e+12  01/03/2018 12:36   Transfer      S
10  1.000000e+12  01/03/2018 12:31   Transfer      S
11  1.000000e+12  01/03/2018 12:31   Transfer      F
12  1.000000e+12  02/03/2018 12:27   Transfer      S
13  1.000000e+12  02/03/2018 12:27   Transfer      S
14  1.000000e+12  02/03/2018 12:27   Transfer      F
15  1.000000e+12  02/03/2018 12:27   Transfer      F

至:

Status       F   S  Grand Total
Start Time                     
2018-01-03   2   2            4
2018-02-03   2   2            4
2018-02-27   0   3            3
2018-02-28   2   3            5
Grand Total  6  10           16

我需要做的是计算在给定日期发生的具有'S'标记的目标文件名,状态只能是'S'或'F'。

我到目前为止使用的代码是:

df = pd.read_csv('JobFileAuditLogs20180227_B.csv', encoding='utf-8') 

df['Start Time'] = pd.to_datetime(df['Start Time']).dt.date

df.to_csv('JobFileAuditLogs20180227_C.csv', sep=',', encoding='utf-8')
df = pd.read_csv('JobFileAuditLogs20180227_C.csv', index_col='Start Time', 
encoding='utf-8') 

df[['Status', 'Destination File Name']]

我试过用

df['Status'].value_counts()   

但这只会给出S和F的出现次数,而不是每天有多少次出现。

我不知道如何从这里开始,任何帮助都会很棒。

python pandas datetime count crosstab
1个回答
1
投票

我相信你需要crosstab

df = pd.crosstab(pd.to_datetime(df['Start Time']).dt.date,
                 df['Status'], 
                 margins=True,
                 margins_name='Grand Total')
print (df)

Status       F   S  Grand Total
Start Time                     
2018-01-03   2   2            4
2018-02-03   2   2            4
2018-02-27   0   3            3
2018-02-28   2   3            5
Grand Total  6  10           16
© www.soinside.com 2019 - 2024. All rights reserved.