假设我有一个这样的示例图:
l = [
['Visitors', '1 February 2020', 'Saturday', 'Shop A', 'In', '100', '20', '30','150', 'Out', '90', '10', '15', '115'],
['Visitors', '1 February 2020', 'Saturday', 'Shop B', 'In', '20', '10', '40', '70', 'Out', '10', '9', '0', '19'],
['Visitors', '1 February 2020', 'Saturday', 'Shop C', 'In', '42', '18', '20', '80', 'Out', '40', '10', '20', '70'],
['Visitors', '1 February 2020', 'Saturday', 'Shop D', 'In', '0', '0', '0', '0', 'Out', '0', '0', '0', '0'],
['Visitors', '1 February 2020', 'Saturday', 'Shop E', 'In', '0', '0', '0', '0', 'Out', '0', '0', '0', '0'],
['Visitors', '1 February 2020', 'Saturday', 'Shop F', 'In', '20', '19', '11', '50', 'Out', '10', '9', '5', '24'],
['Visitors', '1 February 2020', 'Saturday', 'Shop G', 'In', '25', '8', '33', '66', 'Out', '20', '6', '30', '56'],
['Visitors', '1 February 2020', 'Saturday', 'Shop H', 'In', '180', '88', '6', '274', 'Out', '170', '80', '5', '255'],
['Visitors', '1 February 2020', 'Saturday', 'Shop I', 'In', '0', '0', '0', '0', 'Out', '0', '0', '0', '0'],
['Visitors', '1 February 2020', 'Saturday', 'Total', 'In', '387', '163', '140', '690', 'Out', '340', '124', '75', '539'],
]
数字显示每天有多少男人/女人/孩子光顾一家商店,并记录他们的进出记录。上图可以解释如下:
[Ppl_type, Date, Weekday, Shop, In, Men, Women, Children, Total, Out, Men, Women, Children, Total]
这是我希望看到的结果。将上图生成excel,标题如下:
header= ['Ppl_type', 'Date', 'Weekday', 'Shop', 'In/Out', 'Visitor_Type', 'Number']
因此,每个商店将有六行(即三行“In”和三行“Out”)总结以上数字。
我想知道如何通过 python 完成并生成结果以达到 excel。我试过 worksheet.write 但似乎只适用于前四列。非常感谢。
对于完全程序化的解决方案,您可以使用:
header= ['Ppl_type', 'Date', 'Weekday', 'Shop',
'In', 'Men', 'Women', 'Children', 'Total',
'Out', 'Men', 'Women', 'Children', 'Total']
df = pd.DataFrame(l, columns=header)
m1 = df.columns.isin(['In', 'Out'])
grp = df.columns.to_series().where(m1).ffill()
m2 = grp.notna()
m = m2 & ~m1
out = (
df.loc[:, m2==m]
.set_index(list(grp[~m2].index))
.astype(int)
.set_axis(pd.MultiIndex.from_arrays([df.columns[m], grp[m]],
names=('Visitor_Type', 'In/Out')), axis=1)
.stack(['In/Out', 'Visitor_Type']).reset_index(name='Number')
# uncomment the line below to remove the Total
#.loc[lambda d: d['Visitor_Type'].ne('Total')
)
输出:
Ppl_type Date Weekday Shop In/Out Visitor_Type Number
0 Visitors 1 February 2020 Saturday Shop A In Children 30
1 Visitors 1 February 2020 Saturday Shop A In Men 100
2 Visitors 1 February 2020 Saturday Shop A In Total 150
3 Visitors 1 February 2020 Saturday Shop A In Women 20
4 Visitors 1 February 2020 Saturday Shop A Out Children 15
5 Visitors 1 February 2020 Saturday Shop A Out Men 90
6 Visitors 1 February 2020 Saturday Shop A Out Total 115
7 Visitors 1 February 2020 Saturday Shop A Out Women 10
8 Visitors 1 February 2020 Saturday Shop B In Children 40
9 Visitors 1 February 2020 Saturday Shop B In Men 20
10 Visitors 1 February 2020 Saturday Shop B In Total 70
...
您可以使用
1
和 2
更改列名称以删除重复的列名称,因此如果需要原始数据的顺序,可以使用 wide_to_long
和 DataFrame.stack
进行重塑:
L = [['Visitors', '1 February 2020', 'Saturday', 'Shop A', 'In', '100', '20', '30','150', 'Out', '90', '10', '15', '115'],
['Visitors', '1 February 2020', 'Saturday', 'Shop B', 'In', '20', '10', '40', '70', 'Out', '10', '9', '0', '19'],
['Visitors', '1 February 2020', 'Saturday', 'Shop C', 'In', '42', '18', '20', '80', 'Out', '40', '10', '20', '70'],
['Visitors', '1 February 2020', 'Saturday', 'Shop D', 'In', '0', '0', '0', '0', 'Out', '0', '0', '0', '0'],
['Visitors', '1 February 2020', 'Saturday', 'Shop E', 'In', '0', '0', '0', '0', 'Out', '0', '0', '0', '0'],
['Visitors', '1 February 2020', 'Saturday', 'Shop F', 'In', '20', '19', '11', '50', 'Out', '10', '9', '5', '24'],
['Visitors', '1 February 2020', 'Saturday', 'Shop G', 'In', '25', '8', '33', '66', 'Out', '20', '6', '30', '56'],
['Visitors', '1 February 2020', 'Saturday', 'Shop H', 'In', '180', '88', '6', '274', 'Out', '170', '80', '5', '255'],
['Visitors', '1 February 2020', 'Saturday', 'Shop I', 'In', '0', '0', '0', '0', 'Out', '0', '0', '0', '0'],
['Visitors', '1 February 2020', 'Saturday', 'Total', 'In', '387', '163', '140', '690', 'Out', '340', '124', '75', '539']]
cols = ['Ppl_type', 'Date', 'Weekday', 'Shop',
'In/Out1', 'Men1', 'Women1', 'Children1', 'Total1',
'In/Out2', 'Men2', 'Women2', 'Children2', 'Total2']
df = pd.DataFrame(L, columns=cols)
print (df)
df = (pd.wide_to_long(df,
stubnames=['In/Out','Men','Women','Children','Total'],
i=['Ppl_type', 'Date', 'Weekday', 'Shop'],
j='tmp').set_index('In/Out', append=True)
.droplevel(-2)
.rename_axis('Visitor_Type', axis=1)
.stack()
.reset_index(name='Number'))
print (df)
Ppl_type Date Weekday Shop In/Out Visitor_Type Number
0 Visitors 1 February 2020 Saturday Shop A In Men 100
1 Visitors 1 February 2020 Saturday Shop A In Women 20
2 Visitors 1 February 2020 Saturday Shop A In Children 30
3 Visitors 1 February 2020 Saturday Shop A In Total 150
4 Visitors 1 February 2020 Saturday Shop A Out Men 90
.. ... ... ... ... ... ... ...
75 Visitors 1 February 2020 Saturday Total In Total 690
76 Visitors 1 February 2020 Saturday Total Out Men 340
77 Visitors 1 February 2020 Saturday Total Out Women 124
78 Visitors 1 February 2020 Saturday Total Out Children 75
79 Visitors 1 February 2020 Saturday Total Out Total 539
[80 rows x 7 columns]
如果需要在最终输出中删除
Total
:
df = (pd.wide_to_long(df,
stubnames=['In/Out','Men','Women','Children','Total'],
i=['Ppl_type', 'Date', 'Weekday', 'Shop'],
j='tmp').set_index('In/Out', append=True)
.droplevel(-2)
.rename_axis('Visitor_Type', axis=1)
.stack()
.reset_index(name='Number')
.query('Visitor_Type != "Total"'))
print (df.head(10))
Ppl_type Date Weekday Shop In/Out Visitor_Type Number
0 Visitors 1 February 2020 Saturday Shop A In Men 100
1 Visitors 1 February 2020 Saturday Shop A In Women 20
2 Visitors 1 February 2020 Saturday Shop A In Children 30
4 Visitors 1 February 2020 Saturday Shop A Out Men 90
5 Visitors 1 February 2020 Saturday Shop A Out Women 10
6 Visitors 1 February 2020 Saturday Shop A Out Children 15
8 Visitors 1 February 2020 Saturday Shop B In Men 20
9 Visitors 1 February 2020 Saturday Shop B In Women 10
10 Visitors 1 February 2020 Saturday Shop B In Children 40
12 Visitors 1 February 2020 Saturday Shop B Out Men 10
在 Python 中,要将一行分成多行,可以使用 split() 方法根据指定的分隔符将字符串拆分为子字符串列表。这是一个示例代码片段:
row = "John,Smith,25,New York"
delimiter = ","
split_row = row.split(delimiter)
print(split_row)
在这个例子中,行变量包含一个字符串,其中有四个值,用逗号分隔。我们将定界符变量定义为逗号,我们将使用它来拆分行。然后,我们使用带分隔符参数的行变量的 split() 方法将字符串拆分为子字符串列表。生成的 split_row 列表将包含四个元素:“John”、“Smith”、“25”和“New York”。
一旦有了子字符串列表,就可以使用它们来创建多行。例如,您可以使用循环遍历列表并为每个值创建一个新行:
对于 split_row 中的值: 新行 = 值 打印(新行) 这将为 split_row 列表中的每个值创建一个新行。结果输出将是:
约翰 史密斯 25 纽约
你可以为此硬编码一个解析器:
def split_rows(row):
base = [row[0], parsed_date(row[1]), row[2], row[3]]
return [
base + ['In', 'Man', row[5]],
base + ['In', 'Woman', row[6]],
base + ['In', 'Children', row[7]],
base + ['Out', 'Man', row[10]],
base + ['Out', 'Woman', row[11]],
base + ['Out', 'Children', row[12]]
]
然后假设数据是包含数据的列表列表:
final_rows = []
for d in data:
for row in split_rows(d):
final_rows.append(row)
with open('test.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(final_rows)
然后只需执行 parsed_date
PS:我写这篇文章时发布的其他解决方案肯定比这个好