如何通过列中的标题将这些数据与 pandas 分组
sl. no
我有如下 CSV 或 Excel 格式的数据。
sl。不 | v1 | v2 | v3 |
---|---|---|---|
|
|||
1 | 243 | 45 | 3244 |
2 | 3 | 25 | 33 |
3 | 43 | 324 | 54 |
|
|||
1 | 3 | 45 | 54 |
2 | 24 | 4 | 42 |
... | ... | ... | ... |
这是虚拟 csv 文件
sl. No,v1,v2,v3,v4
Heading1,,,,
1,243,45,3244,
2,3,25,33,
3,43,324,54,
Head2,,,,
1,3,45,54,
2,24,4,42,
str.startswith
/str.contains
/…)和cumsum
来增加组编号:
group = df['sl. No'].str.startswith('Head').cumsum()
where
和 ffill
组合:
group = df['sl. No'].where(df['sl. No'].str.startswith('Head')).ffill()
pandas.to_numeric
: 的变体
group = df['sl. No'].where(pd.to_numeric(df['sl. No'], errors='coerce').isna()).ffill()
isna
/isnull
/eq
+cumsum
:
group = df['v1'].isna().cumsum()
# or for empty strings
# group = df['v1'].eq('').cumsum()
或者:
group = df['sl. No'].where(df['v1'].isna()).ffill()
# or for empty strings
# group = df['sl. No'].where(df['v1'].eq('')).ffill()
演示:
df.assign(group1 = df['sl. No'].str.startswith('Head').cumsum(),
group2 = df['sl. No'].where(df['sl. No'].str.startswith('Head')).ffill(),
group3 = df['sl. No'].where(pd.to_numeric(df['sl. No'], errors='coerce').isna()).ffill(),
group4 = df['v1'].isna().cumsum(),
group5 = df['sl. No'].where(df['v1'].isna()).ffill(),
)
sl. No v1 v2 v3 v4 group1 group2 group3 group4 group5
0 Heading1 NaN NaN NaN NaN 1 Heading1 Heading1 1 Heading1
1 1 243.0 45.0 3244.0 NaN 1 Heading1 Heading1 1 Heading1
2 2 3.0 25.0 33.0 NaN 1 Heading1 Heading1 1 Heading1
3 3 43.0 324.0 54.0 NaN 1 Heading1 Heading1 1 Heading1
4 Head2 NaN NaN NaN NaN 2 Head2 Head2 2 Head2
5 1 3.0 45.0 54.0 NaN 2 Head2 Head2 2 Head2
6 2 24.0 4.0 42.0 NaN 2 Head2 Head2 2 Head2