如何在不引起属性错误的情况下按值对列表进行分组

问题描述 投票:0回答:1

我有一个CSV。OutputA 与格式。

Position,Category,Name,Team,Points
1,A,James,Team 1,100
2,A,Mark,Team 2,95
3,A,Tom,Team 1,90

我想得到一个CSV格式的输出,它可以得到每支队伍的总积分,每支队伍的平均积分和车手人数。

因此,输出将是。

Team,Points,AvgPoints,NumOfRiders
Team1,190,95,2
Team2,95,95,1

我用这个函数把每一行转换成一个命名的tuple。

fields = ("Position", "Category", "Name", "Team", "Points")
Results = namedtuple('CategoryResults', fields)

def csv_to_tuple(path):
    with open(path, 'r', errors='ignore') as file:
        reader = csv.reader(file)
        for row in map(Results._make, reader):
            yield row

然后将这些行按俱乐部分类,形成一个排序列表。

moutputA = sorted(list(csv_to_tuple("Male/outputA.csv")), key=lambda k: k[3])

这将返回一个列表,如:

[CategoryResults(Position='13', Category='A', Name='Marek', Team='1', Points='48'), CategoryResults(Position='7', Category='A', Name='', Team='1', Points='70')]

我相信到目前为止这样做是正确的,尽管我可能是错的。

我试图创建一个新的球队积分列表(还没有加起来)。

例如:我想创建一个新的团队列表,其中包括积分(还没有加起来)。

[Team 1(1,2,3,4,5)]
[Team 2 (6,9,10)]
etc.

我的想法是,我可以找到有多少独特的积分值(这等于车手的数量)。然而,当我试图对列表进行分组时,我有这样的代码。

Clubs = []
Club_Points = []
for Names, Club in groupby(moutputA, lambda x: x[3]):
    for Teams in Names:
        Clubs.append(list(Teams))

for Club, Points in groupby(moutputA, lambda x: x[4]):
    for Point in Clubs:
        Club_Points.append(list(Point))

print(Clubs)

但却返回了这个错误

    Teams.append(list(Team))
AttributeError: 'itertools._grouper' object has no attribute 'append'
python tuples itertools
1个回答
1
投票

如果 data.csv 包含。

Position,Category,Name,Team,Points
1,A,James,Team 1,100
2,A,Mark,Team 2,95
3,A,Tom,Team 1,90

那么这个剧本。

import csv
from collections import namedtuple
from itertools import groupby
from statistics import mean

fields = ("Position", "Category", "Name", "Team", "Points")
Results = namedtuple('CategoryResults', fields)

def csv_to_tuple(path):
    with open(path, 'r', errors='ignore') as file:
        next(file) # skip header
        reader = csv.reader(file)
        for row in map(Results._make, reader):
            yield row

moutputA = sorted(csv_to_tuple("data.csv"), key=lambda k: k.Team)

out = []
for team, group in groupby(moutputA, lambda x: x.Team):
    group = list(group)
    d = {}
    d['Team'] = team
    d['Points'] = sum(int(i.Points) for i in group)
    d['AvgPoints'] = mean(int(i.Points) for i in group)
    d['NumOfRider'] = len(group)
    out.append(d)


with open('data_out.csv', 'w', newline='') as csvfile:
    fieldnames = ['Team', 'Points', 'AvgPoints', 'NumOfRider']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    for row in out:
        writer.writerow(row)

制作 data_out.csv:

Team,Points,AvgPoints,NumOfRider
Team 1,190,95,2
Team 2,95,95,1

截图来自LibreOffice。

enter image description here


1
投票

这是一个开始。你应该能想出如何从这里得到你想要的东西。

import csv, io
from collections import namedtuple
from itertools import groupby

data = '''\
Position,Category,Name,Team,Points
1,A,James,Team 1,100
2,A,Mark,Team 2,95
3,A,Tom,Team 1,90
'''

b = io.StringIO(data)
next(b)

fields = ("Position", "Category", "Name", "Team", "Points")
Results = namedtuple('CategoryResults', fields)


def csv_to_tuple(file):
    reader = csv.reader(file)
    for row in map(Results._make, reader):
        yield row


rows = sorted(list(csv_to_tuple(b)), key=lambda k: k[3])

for TeamName, TeamRows in groupby(rows, lambda x: x[3]):
    print(TeamName)
    TeamPoints = [row.Points for row in TeamRows]
    print(TeamPoints)
    print()


1
投票

所有这些都会变得更容易,只需使用 pandas. 请看下面的代码。

import pandas as pd
import numpy as np

df = pd.read_csv(input_path)

teams = list(set(df['Team'])) # unique list of all the teams
num_teams = len(teams)

points = np.empty(shape=num_teams)
avg_points = np.empty(shape=num_teams)
num_riders = np.empty(shape=num_teams)

for i in range(num_teams):
    # find all rows where the entry in the 'Team' column
    # is the same as teams[i]
    req = df.loc[df['Team'] == teams[i]]
    points[i] = np.sum(req['Points'])
    num_riders[i] = len(req)
    avg_points[i] = point[i]/num_riders[i]

dict_out = {
    'Team':teams,
    'Points':points,
    'AvgPoints':avg_points,
    'NumOfRiders':num_riders
}
df_out = pd.DataFrame(data=dict_out)
df_out.to_csv(output_path)
© www.soinside.com 2019 - 2024. All rights reserved.