在 pandas 组之间添加多个空行而不附加

Question

我想在 pandas 数据框中的每个 groupby 之间添加几个空行。我知道过去曾被问过类似的问题，但我能找到的所有答案都依赖于最近停止的附加功能。我想我已经很接近了，但我无法让它发挥作用。

从我读到的内容来看，这个想法是用 concat 函数来代替追加，所以我一直在尝试 1）创建我的组，2）创建一个具有正确列和行数的空白数据框，然后 3）循环组并将它们分别与空白数据框连接起来。这看起来像：

当前 df：

    column1    column2    column3
0      a          1        blue
1      b          2        blue
2      a          1        green
3      b          2        green
4      a          1        black
5      b          2        black

注意：我的 df 已经按第 3 列排序，因此它们已经以这种方式“分组”

我正在尝试：

# Create my groups by the desired column
dfg = df.groupby("column3")

# Create my blank df with the same columns as my main df and with the desired number of blank rows
blank_df5 = pd.DataFrame(columns=['column1','column2','column3'],index=['0','1','2','3','4'])

# Loop through and concatenate groups and the blank df
for colors in dfg:
    pd.concat([colors, blank_df5], ignore_index=True)

print(dfg)

返回：TypeError：无法连接类型为“”的对象；仅 Series 和 DataFrame 对象有效

我期望/想要什么：

    column1    column2    column3
0      a          1        blue
1      b          2        blue
0
1
2
3
4
2      a          1        green
3      b          2        green
0
1
2
3
4
4      a          1        black
5      b          2        black

然后我尝试将这些组放入自己的 dfs 中，然后循环遍历：

dfg = df.groupby('column1')
[dfg.get_group(x) for x in dfg.groups]

blank_df5 = pd.DataFrame(columns=['column1','column2','column3'],index=['0','1','2','3','4'])

for colors in dfg:
    pd.concat([colors, blank_df5], ignore_index=True)

# I also tried [pd.concat([colors, blank_df5], ignore_index=True) for column3 in dfw] with the same result

结果仍然是：TypeError：无法连接类型为“”的对象；仅 Series 和 DataFrame 对象有效

我尝试过的其他事情：

mask = df['column3'].ne(df['column3'].shift(-1))
df1 = pd.DataFrame('', index=mask.index[mask] + .5, columns=df.columns)

dfg = pd.concat([df,df1]).sort_index().reset_index(drop=True).iloc[:-1]

print(dfg)

这可以在组之间添加一个空行，但我无法让它添加更多。

dfg = (pd.concat([df, 
            df.groupby('column3').apply(lambda x: x.shift(-1).iloc[-1]).reset_index()])
           .sort_values('column3')
           .reset_index(drop=True))

print(dfg)

这将返回：ValueError：无法插入column3，已经存在

dfg = df.groupby('column1')

for colors in dfg:
        new_rows = 5
        new_index = pd.RangeIndex(len(colors)*(new_rows+1))
        dfg = pd.DataFrame(np.nan, index=new_index, columns=df.columns)
        ids = np.arange(len(colors))*(new_rows+1)
        dfg.loc[ids] = df.values

print(dfg)

这将返回：ValueError：无法将输入数组从形状（710，）广播到形状（2，）如果我删除循环并只运行循环中的内容，它将在每行数据之间添加空行。

希望这是有道理的，提前感谢您的帮助。

如果有人好奇，我需要这样做的原因是将其以这种格式转储到 Excel 中（我知道，这是公司的决定，而不是我的），以便进一步进行人工分析和操作。我正在使用 xlwings 进行转储，但我找不到在转储期间或之后将其与 xlwings 分开的方法。不过，我也绝对愿意接受有关如何做到这一点的建议。

Answer 1

遵循第二种方法：

N = 5

grps = df.groupby("column3", sort=False)

out = pd.concat(
    [
        pd.concat([g, pd.DataFrame("", index=range(N), columns=df.columns)])
        if i < len(grps)-1 else g for i, (_, g) in enumerate(grps)
    ]
)

输出：

print(out)

  column1 column2 column3
0       a       1    blue
1       b       2    blue
0                        
1                        
2                        
3                        
4                        
2       a       1   green
3       b       2   green
0                        
1                        
2                        
3                        
4                        
4       a       1   black
5       b       2   black

[16 rows x 3 columns]

Answer 2

您可以尝试在每种颜色后附加空白行，但附加是一个低效的问题。

查看答案here和here，我发现下面的代码是更好的解决方案。请注意，我假设颜色组始终以 2 为一组，如果情况并非如此，则必须更改代码。

import numpy as np
# create an empty dataframe with the required numbers of rows first
n = 3 # num of blank rows to add
new_index = pd.RangeIndex(len(df)/2*(n+1))
new_df = pd.DataFrame(np.nan, index=new_index, columns=df.columns)

# fill it with original data frame values at the required indices:

arr = np.arange(0,len(new_df), step=n+2),np.arange(1,len(new_df), step=n+2)
ids = np.sort(np.concatenate(arr))
new_df.loc[ids] = df.values
new_df

输出：

   column1  column2 column3
0        a      1.0    blue
1        b      2.0    blue
2      NaN      NaN     NaN
3      NaN      NaN     NaN
4      NaN      NaN     NaN
5        a      1.0   green
6        b      2.0   green
7      NaN      NaN     NaN
8      NaN      NaN     NaN
9      NaN      NaN     NaN
10       a      1.0   black
11       b      2.0   black

在 pandas 组之间添加多个空行而不附加

问题描述投票：0回答：2

2个回答

最新问题

在 pandas 组之间添加多个空行而不附加

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2