我有一个CSV。OutputA
与格式。
Position,Category,Name,Team,Points
1,A,James,Team 1,100
2,A,Mark,Team 2,95
3,A,Tom,Team 1,90
我想得到一个CSV格式的输出,它可以得到每支队伍的总积分,每支队伍的平均积分和车手人数。
因此,输出将是。
Team,Points,AvgPoints,NumOfRiders
Team1,190,95,2
Team2,95,95,1
我用这个函数把每一行转换成一个命名的tuple。
fields = ("Position", "Category", "Name", "Team", "Points")
Results = namedtuple('CategoryResults', fields)
def csv_to_tuple(path):
with open(path, 'r', errors='ignore') as file:
reader = csv.reader(file)
for row in map(Results._make, reader):
yield row
然后将这些行按俱乐部分类,形成一个排序列表。
moutputA = sorted(list(csv_to_tuple("Male/outputA.csv")), key=lambda k: k[3])
这将返回一个列表,如:
[CategoryResults(Position='13', Category='A', Name='Marek', Team='1', Points='48'), CategoryResults(Position='7', Category='A', Name='', Team='1', Points='70')]
我相信到目前为止这样做是正确的,尽管我可能是错的。
我试图创建一个新的球队积分列表(还没有加起来)。
例如:我想创建一个新的团队列表,其中包括积分(还没有加起来)。
[Team 1(1,2,3,4,5)]
[Team 2 (6,9,10)]
etc.
我的想法是,我可以找到有多少独特的积分值(这等于车手的数量)。然而,当我试图对列表进行分组时,我有这样的代码。
Clubs = []
Club_Points = []
for Names, Club in groupby(moutputA, lambda x: x[3]):
for Teams in Names:
Clubs.append(list(Teams))
for Club, Points in groupby(moutputA, lambda x: x[4]):
for Point in Clubs:
Club_Points.append(list(Point))
print(Clubs)
但却返回了这个错误
Teams.append(list(Team))
AttributeError: 'itertools._grouper' object has no attribute 'append'
如果 data.csv
包含。
Position,Category,Name,Team,Points
1,A,James,Team 1,100
2,A,Mark,Team 2,95
3,A,Tom,Team 1,90
那么这个剧本。
import csv
from collections import namedtuple
from itertools import groupby
from statistics import mean
fields = ("Position", "Category", "Name", "Team", "Points")
Results = namedtuple('CategoryResults', fields)
def csv_to_tuple(path):
with open(path, 'r', errors='ignore') as file:
next(file) # skip header
reader = csv.reader(file)
for row in map(Results._make, reader):
yield row
moutputA = sorted(csv_to_tuple("data.csv"), key=lambda k: k.Team)
out = []
for team, group in groupby(moutputA, lambda x: x.Team):
group = list(group)
d = {}
d['Team'] = team
d['Points'] = sum(int(i.Points) for i in group)
d['AvgPoints'] = mean(int(i.Points) for i in group)
d['NumOfRider'] = len(group)
out.append(d)
with open('data_out.csv', 'w', newline='') as csvfile:
fieldnames = ['Team', 'Points', 'AvgPoints', 'NumOfRider']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in out:
writer.writerow(row)
制作 data_out.csv
:
Team,Points,AvgPoints,NumOfRider
Team 1,190,95,2
Team 2,95,95,1
截图来自LibreOffice。
这是一个开始。你应该能想出如何从这里得到你想要的东西。
import csv, io
from collections import namedtuple
from itertools import groupby
data = '''\
Position,Category,Name,Team,Points
1,A,James,Team 1,100
2,A,Mark,Team 2,95
3,A,Tom,Team 1,90
'''
b = io.StringIO(data)
next(b)
fields = ("Position", "Category", "Name", "Team", "Points")
Results = namedtuple('CategoryResults', fields)
def csv_to_tuple(file):
reader = csv.reader(file)
for row in map(Results._make, reader):
yield row
rows = sorted(list(csv_to_tuple(b)), key=lambda k: k[3])
for TeamName, TeamRows in groupby(rows, lambda x: x[3]):
print(TeamName)
TeamPoints = [row.Points for row in TeamRows]
print(TeamPoints)
print()
所有这些都会变得更容易,只需使用 pandas
. 请看下面的代码。
import pandas as pd
import numpy as np
df = pd.read_csv(input_path)
teams = list(set(df['Team'])) # unique list of all the teams
num_teams = len(teams)
points = np.empty(shape=num_teams)
avg_points = np.empty(shape=num_teams)
num_riders = np.empty(shape=num_teams)
for i in range(num_teams):
# find all rows where the entry in the 'Team' column
# is the same as teams[i]
req = df.loc[df['Team'] == teams[i]]
points[i] = np.sum(req['Points'])
num_riders[i] = len(req)
avg_points[i] = point[i]/num_riders[i]
dict_out = {
'Team':teams,
'Points':points,
'AvgPoints':avg_points,
'NumOfRiders':num_riders
}
df_out = pd.DataFrame(data=dict_out)
df_out.to_csv(output_path)