我目前有一个文本文件,由人发送消息时带有时间戳和名称的行组成。请参见以下内容:Attachment
8/29/19, 2:03 PM - Michael: ...
8/29/19, 3:05 PM - Frank: ...
8/29/19, 4:01 PM - Tom: ...
8/29/19, 5:26 PM - Amy: ...
8/29/19, 6:46 PM - Tom: ...
8/29/19, 7:24 PM - Frank: ...
8/29/19, 9:55 PM - Amy: ...
8/30/19, 11:35 AM - Frank: ...
8/30/19, 12:39 PM - Johnny: ...
9/3/19, 1:18 AM - Frank: ...
9/3/19, 2:23 AM - Frank: ...
9/3/19, 3:16 PM - Frank: ...
9/3/19, 4:53 PM - Johnny: ...
9/4/19, 9:01 AM - Frank: ...
9/4/19, 11:45 AM - Frank: ...
9/4/19, 1:04 PM - Johnny: ...
9/4/19, 1:42 PM - Johnny: ...
9/4/19, 2:03 PM - Amy: ...
9/4/19, 4:12 PM - Johnny: ...
9/4/19, 6:27 PM - Amy: ...
9/4/19, 9:08 PM - Johnny: ...
. . .
. . .
. . .
我想计算一个人根据python中的日期发送消息的次数。我想输出以下内容:Attachment
Michael Frank Tom Amy Johnny
8/29/2019 1 2 2 2 0
8/30/2019 0 1 0 0 1
8/31/2019 0 0 0 0 0
9/1/2019 0 0 0 0 0
9/2/2019 0 0 0 0 0
9/3/2019 0 3 0 0 1
9/4/2019 0 2 0 2 4
9/5/2019
9/6/2019
9/7/2019
9/8/2019
是初次发布的海报,如果格式不正确,请原谅我。非常感谢。
通过仅遍历文件一次来实现此目的的一种方法是借助defaultdict:
from collections import defaultdict
occurrences = defaultdict(lambda: defaultdict(int))
with open('filename.txt', 'r') as f:
for line in f.readlines():
date = line.split(', ')[0]
name = line.split(' - ')[1].split(': ')[0]
occurrences[date][name] += 1
出现次数将包含以下数据:
8/29/19: {'Michael': 1, 'Frank': 2, 'Tom': 2, 'Amy': 2}
8/30/19: {'Frank': 1, 'Johnny': 1}
9/3/19: {'Frank': 3, 'Johnny': 1}
9/4/19: {'Frank': 2, 'Johnny': 4, 'Amy': 2}
编辑:这将打印出所需的确切输出OP:
from collections import defaultdict
from datetime import datetime, timedelta
occurrences = defaultdict(lambda: defaultdict(int))
with open('filename.txt', 'r') as f:
lines = f.readlines()
start_date = lines[0].split(' - ')[0]
start_date = datetime.strptime(start_date, '%m/%d/%y, %I:%M %p')
end_date = lines[-1].split(' - ')[0]
end_date = datetime.strptime(end_date, '%m/%d/%y, %I:%M %p')
dates = []
for n in (range(int((end_date - start_date).days))):
single_date = start_date + timedelta(n)
dates.append(single_date.date())
authors = set()
for line in lines:
name = line.split(' - ')[1].split(': ')[0]
authors.add(name)
date = line.split(' - ')[0]
date = datetime.strptime(date, '%m/%d/%y, %I:%M %p').date()
occurrences[date][name] += 1
print('\t\t', end='')
for name in authors:
print (name, end='\t')
print()
for date in dates:
print(date.strftime('%m/%d/%y'), end='\t')
for name in authors:
print(occurrences[date][name], end='\t')
print()
此解决方案尚有待改进,因为它完全不考虑性能。
您可以使用熊猫来帮助:
from io import StringIO
import pandas as pd
txtfile=StringIO("""8/29/19, 2:03 PM - Michael: ...
8/29/19, 3:05 PM - Frank: ...
8/29/19, 4:01 PM - Tom: ...
8/29/19, 5:26 PM - Amy: ...
8/29/19, 6:46 PM - Tom: ...
8/29/19, 7:24 PM - Frank: ...
8/29/19, 9:55 PM - Amy: ...
8/30/19, 11:35 AM - Frank: ...
8/30/19, 12:39 PM - Johnny: ...
9/3/19, 1:18 AM - Frank: ...
9/3/19, 2:23 AM - Frank: ...
9/3/19, 3:16 PM - Frank: ...
9/3/19, 4:53 PM - Johnny: ...
9/4/19, 9:01 AM - Frank: ...
9/4/19, 11:45 AM - Frank: ...
9/4/19, 1:04 PM - Johnny: ...
9/4/19, 1:42 PM - Johnny: ...
9/4/19, 2:03 PM - Amy: ...
9/4/19, 4:12 PM - Johnny: ...
9/4/19, 6:27 PM - Amy: ...
9/4/19, 9:08 PM - Johnny: ...""")
df = pd.read_csv(txtfile, sep=',|-|:', header=None, index_col=[0], engine='python')
df_out = df[3].str.get_dummies().sum(level=0)
print(df_out)
输出:
Amy Frank Johnny Michael Tom
0
8/29/19 2 2 0 1 2
8/30/19 0 1 1 0 0
9/3/19 0 3 1 0 0
9/4/19 2 2 4 0 0