使用python pandas合并子组值

问题描述 投票:0回答:2

我想连接一个会计订单号应该是基于事务组的输出文件中的第一个。输入文件

01 2019-03-01 Travel     1500 DCA CR
04 2019-03-01 Allowance   300 ATC DR
05 2019-03-02 Local Trip  100 TCO CR
             Accounting Order 190291

22 2019-02-01 Charges     2500 DCA CR
98 2019-02-08 Allowance    900 ATC DR
36 2019-01-30 Local Trip    50 TCO CR
74 2019-02-09 Court fees   300 ATC DR
             Accounting Order 195297

33 2019-03-01 Travel     1500 DCA CR
97 2019-03-01 Allowance   300 ATC DR
             Accounting Order 180876

输出应该是

190291 01 2019-03-01 Travel     1500 DCA CR
190291 04 2019-03-01 Allowance   300 ATC DR
190291 05 2019-03-02 Local Trip  100 TCO CR
195297 22 2019-02-01 Charges     2500 DCA CR
195297 98 2019-02-08 Allowance    900 ATC DR
195297 36 2019-01-30 Local Trip    50 TCO CR
195297 74 2019-02-09 Court fees   300 ATC DR
180876 33 2019-03-01 Travel     1500 DCA CR
180876 97 2019-03-01 Allowance   300 ATC DR

有没有办法连接这样的帐号值?任何帮助或建议表示赞赏。

python pandas
2个回答
0
投票

根据需要使用pd.read_fwf(),使用方法回填使用fillna()

# reads the file with positional reference
cols = [(0,2),(2,13),(14,24),(25,29),(30,34),(34,37)]
names = ['id','date','desc','value','type1','type2']
df = pd.read_fwf('my_file_22.txt', header=None, colspecs = cols, names = names)

# creates the new column
df['Accounting Order'] = df[df.desc == 'Accounting']['type1'] + df[df.desc == 'Accounting']['type2']
nans = (df.desc == 'Accounting') | df.id.isna()
df = df.fillna(method='backfill')
df = df[~nans]

它产生以下输出:

    id      date        desc        value   type1   type2   Accounting Order
0   1.0     2019-03-01  Travel      1500    DCA     CR      190291
1   4.0     2019-03-01  Allowance   300     ATC     DR      190291
2   5.0     2019-03-02  Local Trip  100     TCO     CR      190291
5   22.0    2019-02-01  Charges     2500    DCA     CR      195297
6   98.0    2019-02-08  Allowance   900     ATC     DR      195297
7   36.0    2019-01-30  Local Trip  50      TCO     CR      195297
8   74.0    2019-02-09  Court fees  300     ATC     DR      195297
11  33.0    2019-03-01  Travel      1500    DCA     CR      180876
12  97.0    2019-03-01  Allowance   300     ATC     DR      180876

观察:

1)考虑位置读数,一旦列宽度不同,会引起一些问题;

2)考虑解决方案的数据是:

01 2019-03-01 Travel     1500 DCA CR
04 2019-03-01 Allowance   300 ATC DR
05 2019-03-02 Local Trip  100 TCO CR
              Accounting Order 190291

22 2019-02-01 Charges    2500 DCA CR
98 2019-02-08 Allowance   900 ATC DR
36 2019-01-30 Local Trip   50 TCO CR
74 2019-02-09 Court fees  300 ATC DR
              Accounting Order 195297

33 2019-03-01 Travel     1500 DCA CR
97 2019-03-01 Allowance   300 ATC DR
              Accounting Order 180876

1
投票

例如,使用以下代码(基本上将所有行分为两个列表z[0]z[1],基于它们是否包含'会计订单',然后在read_fwf的非会计订单行上执行z[0],同时添加bfilled会计订单会计订单列表中的数字z[1]):

with open('input.txt') as f:
    s = f.read()

z = list(zip(*[(x.split('Accounting Order')[1], '') if 'Accounting Order' in x
               else (np.nan, x)
               for x in s.splitlines()]))

df = pd.concat([
    pd.DataFrame(z[0], columns=['Accounting Order']).bfill(),
    pd.read_fwf(pd.compat.StringIO('\n'.join(z[1])), header=None)], 1).dropna()

print(df)

输出:

   Accounting Order     0           1           2            3
0            190291   1.0  2019-03-01      Travel  1500 DCA CR
1            190291   4.0  2019-03-01   Allowance   300 ATC DR
2            190291   5.0  2019-03-02  Local Trip   100 TCO CR
5            195297  22.0  2019-02-01     Charges  2500 DCA CR
6            195297  98.0  2019-02-08   Allowance   900 ATC DR
7            195297  36.0  2019-01-30  Local Trip    50 TCO CR
8            195297  74.0  2019-02-09  Court fees   300 ATC DR
11           180876  33.0  2019-03-01      Travel  1500 DCA CR
12           180876  97.0  2019-03-01   Allowance   300 ATC DR
© www.soinside.com 2019 - 2024. All rights reserved.