我有以下数据框
import pandas as pd
df = pd.DataFrame()
df['number'] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
df['name']=('Alex','Alex','Alex','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Abhishek','Abhishek','Abhishek','Blake','Blake','Blake','Blake')
df['hours']=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
df['loc']=('Nar','SCC','RSL','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNI','UNI','UNI','UNKING','UNKING','UNKING','UNKING')
print(df)
对于时间在 >=10 和 <=12, I need to:
之间的每一行对于小时数 >12
的每一行新数据框的结果应如下所示:
# subset the dataframe based on hour value
s = df[df['hours'] < 10]
s1 = df[df['hours'] > 12]
s2 = df[df['hours'].between(10, 12)]
# Assign duplicate rows per subset and concat
pd.concat([
s,
s1.assign(hours=10),
s1.assign(hours=2),
s1.assign(hours=s1['hours'] - 12),
s2.assign(hours=10),
s2.assign(hours=s2['hours'] - 10)]
).sort_index(kind='stable', ignore_index=True)
number name hours loc
0 651 Alex 8.25 Nar
1 651 Alex 7.50 SCC
2 651 Alex 7.50 RSL
3 4267 Ankit 7.50 UNIT-C
4 4267 Ankit 10.00 UNIT-C
5 4267 Ankit 2.00 UNIT-C
6 4267 Ankit 2.00 UNIT-C
7 4267 Ankit 10.00 UNIT-C
8 4267 Ankit 2.00 UNIT-C
9 4267 Ankit 10.00 UNIT-C
10 4267 Ankit 2.00 UNIT-C
11 4267 Ankit 3.00 UNIT-C
12 4267 Ankit 10.00 UNIT-C
13 4267 Ankit 1.00 UNIT-C
14 4267 Ankit 6.50 UNIT-C
15 4267 Ankit 10.00 UNIT-C
16 4267 Ankit 2.00 UNIT-C
17 4267 Ankit 2.00 UNIT-C
18 8806 Abhishek 10.00 UNI
19 8806 Abhishek 2.00 UNI
20 8806 Abhishek 3.00 UNI
21 8806 Abhishek 10.00 UNI
22 8806 Abhishek 2.00 UNI
23 8806 Abhishek 3.00 UNI
24 8806 Abhishek 10.00 UNI
25 8806 Abhishek 2.00 UNI
26 8806 Abhishek 1.50 UNI
27 6841 Blake 8.00 UNKING
28 6841 Blake 8.00 UNKING
29 6841 Blake 8.00 UNKING
30 6841 Blake 8.00 UNKING