我正在尝试先按日期对CSV进行排序,然后再对时间进行排序。对于熊猫,使用df = df.sort_values(by=['Date', 'Time_UTC'])
很容易。在csv库中,代码是(从这里开始):
with open ('eqph_csv_29May2020_noF_5lines.csv') as file:
reader = csv.DictReader(file, delimiter=',')
date_sorted = sorted(reader, key=lambda Date: datetime.strptime('Date', '%Y-%m-%d'))
print(date_sorted)
datetime documentation清楚地表明这些代码是正确的。这是一个示例CSV(无分隔符):
Date Time_UTC Latitude Longitude
2020-05-28 05:17:31 16.63 120.43
2020-05-23 02:10:27 15.55 121.72
2020-05-20 12:45:07 5.27 126.11
2020-05-09 19:18:12 14.04 120.55
2020-04-10 18:45:49 5.65 126.54
csv.DictReader
返回一个迭代器,该迭代器为csv文件中的每一行产生一个dict
。要对每一行的一列进行排序,您需要在sort函数中指定该列:
date_sorted = sorted(reader, key=lambda row: datetime.strptime(row['Date'], '%Y-%m-%d'))
要同时对Date
和Time_UTC
进行排序,可以将它们组合成一个字符串并将其转换为datetime
:
date_sorted = sorted(reader, key=lambda row: datetime.strptime(row['Date'] + ' ' + row['Time_UTC'], '%Y-%m-%d %H:%M:%S'))
尼克的答案起作用了,并用它来修订我的。我改用csv.reader()。
lon,lat = [],[]
xy = zip(lon,lat)
with open ('eqph_csv_29May2020_noF_20lines.csv') as file:
reader = csv.reader(file, delimiter=',')
next(reader)
date_sorted = sorted(reader, key=lambda row: datetime.strptime
(row[0] + ' ' + row[1], '%Y-%m-%d %H:%M:%S'))
for row in date_sorted:
lon.append(float(row[2]))
lat.append(float(row[3]))
for i in xy:
print(i)
结果
(6.14, 126.2)
(14.09, 121.36)
(13.74, 120.9)
(6.65, 125.42)
(6.61, 125.26)
(5.49, 126.57)
(5.65, 125.61)
(11.33, 124.64)
(11.49, 124.42)
(15.0, 119.79) # 2020-03-19 06:33:00
(14.94, 120.17) # 2020-03-19 06:49:00
(6.7, 125.18)
(5.76, 125.14)
(9.22, 124.01)
(20.45, 122.12)
(5.65, 126.54)
(14.04, 120.55)
(5.27, 126.11)
(15.55, 121.72)
(16.63, 120.43)