我想连接一个会计订单号应该是基于事务组的输出文件中的第一个。输入文件
01 2019-03-01 Travel 1500 DCA CR
04 2019-03-01 Allowance 300 ATC DR
05 2019-03-02 Local Trip 100 TCO CR
Accounting Order 190291
22 2019-02-01 Charges 2500 DCA CR
98 2019-02-08 Allowance 900 ATC DR
36 2019-01-30 Local Trip 50 TCO CR
74 2019-02-09 Court fees 300 ATC DR
Accounting Order 195297
33 2019-03-01 Travel 1500 DCA CR
97 2019-03-01 Allowance 300 ATC DR
Accounting Order 180876
输出应该是
190291 01 2019-03-01 Travel 1500 DCA CR
190291 04 2019-03-01 Allowance 300 ATC DR
190291 05 2019-03-02 Local Trip 100 TCO CR
195297 22 2019-02-01 Charges 2500 DCA CR
195297 98 2019-02-08 Allowance 900 ATC DR
195297 36 2019-01-30 Local Trip 50 TCO CR
195297 74 2019-02-09 Court fees 300 ATC DR
180876 33 2019-03-01 Travel 1500 DCA CR
180876 97 2019-03-01 Allowance 300 ATC DR
有没有办法连接这样的帐号值?任何帮助或建议表示赞赏。
根据需要使用pd.read_fwf(),使用方法回填使用fillna():
# reads the file with positional reference
cols = [(0,2),(2,13),(14,24),(25,29),(30,34),(34,37)]
names = ['id','date','desc','value','type1','type2']
df = pd.read_fwf('my_file_22.txt', header=None, colspecs = cols, names = names)
# creates the new column
df['Accounting Order'] = df[df.desc == 'Accounting']['type1'] + df[df.desc == 'Accounting']['type2']
nans = (df.desc == 'Accounting') | df.id.isna()
df = df.fillna(method='backfill')
df = df[~nans]
它产生以下输出:
id date desc value type1 type2 Accounting Order
0 1.0 2019-03-01 Travel 1500 DCA CR 190291
1 4.0 2019-03-01 Allowance 300 ATC DR 190291
2 5.0 2019-03-02 Local Trip 100 TCO CR 190291
5 22.0 2019-02-01 Charges 2500 DCA CR 195297
6 98.0 2019-02-08 Allowance 900 ATC DR 195297
7 36.0 2019-01-30 Local Trip 50 TCO CR 195297
8 74.0 2019-02-09 Court fees 300 ATC DR 195297
11 33.0 2019-03-01 Travel 1500 DCA CR 180876
12 97.0 2019-03-01 Allowance 300 ATC DR 180876
观察:
1)考虑位置读数,一旦列宽度不同,会引起一些问题;
2)考虑解决方案的数据是:
01 2019-03-01 Travel 1500 DCA CR
04 2019-03-01 Allowance 300 ATC DR
05 2019-03-02 Local Trip 100 TCO CR
Accounting Order 190291
22 2019-02-01 Charges 2500 DCA CR
98 2019-02-08 Allowance 900 ATC DR
36 2019-01-30 Local Trip 50 TCO CR
74 2019-02-09 Court fees 300 ATC DR
Accounting Order 195297
33 2019-03-01 Travel 1500 DCA CR
97 2019-03-01 Allowance 300 ATC DR
Accounting Order 180876
例如,使用以下代码(基本上将所有行分为两个列表z[0]
和z[1]
,基于它们是否包含'会计订单',然后在read_fwf
的非会计订单行上执行z[0]
,同时添加bfill
ed会计订单会计订单列表中的数字z[1]
):
with open('input.txt') as f:
s = f.read()
z = list(zip(*[(x.split('Accounting Order')[1], '') if 'Accounting Order' in x
else (np.nan, x)
for x in s.splitlines()]))
df = pd.concat([
pd.DataFrame(z[0], columns=['Accounting Order']).bfill(),
pd.read_fwf(pd.compat.StringIO('\n'.join(z[1])), header=None)], 1).dropna()
print(df)
输出:
Accounting Order 0 1 2 3
0 190291 1.0 2019-03-01 Travel 1500 DCA CR
1 190291 4.0 2019-03-01 Allowance 300 ATC DR
2 190291 5.0 2019-03-02 Local Trip 100 TCO CR
5 195297 22.0 2019-02-01 Charges 2500 DCA CR
6 195297 98.0 2019-02-08 Allowance 900 ATC DR
7 195297 36.0 2019-01-30 Local Trip 50 TCO CR
8 195297 74.0 2019-02-09 Court fees 300 ATC DR
11 180876 33.0 2019-03-01 Travel 1500 DCA CR
12 180876 97.0 2019-03-01 Allowance 300 ATC DR