我有一个 read.csv DataFrame,它不断更新,每次运行脚本时都会添加一个新行,看起来像......
df = pd.read_csv(file_path)
print(df.to_string(index=False))
timestamp Puts Calls PutCh CallCh ChDiff
09:41:12 AM 2027891 1820724 280101 200974 79127
09:48:51 AM 2075976 1862053 328186 242303 85883
09:58:48 AM 2091487 1885842 343697 266092 77605
10:08:21 AM 2091879 1918592 344089 298842 45247
02:26:00 PM 1995234 1941917 247444 322167 -74723
02:44:36 PM 1990071 1934874 242281 315124 -72843
02:56:17 PM 1970892 1938472 223102 318722 -95620
现在我想要每个后续行与我已阅读有关 df.diff() 的前一行的差异。所以我删除了时间戳列以获取新的数据名 df1 并编写了我的脚本...
df1.diff()
我的输出为....
Puts Calls PutCh CallCh ChDiff
NaN NaN NaN NaN NaN
48085.0 41329.0 48085.0 41329.0 6756.0
15511.0 23789.0 15511.0 23789.0 -8278.0
392.0 32750.0 392.0 32750.0 -32358.0
-96645.0 23325.0 -96645.0 23325.0 -119970.0
-5163.0 -7043.0 -5163.0 -7043.0 1880.0
-19179.0 3598.0 -19179.0 3598.0 -22777.0
在这里,我希望将这些差异值添加到每列括号中的原始 DataFrame(df) 中。更详细地说,我的输出应该看起来像(这里时间戳列也应该像我的 df 中一样)....
Puts Calls PutCh CallCh ChDiff
2027891 1820724 280101 200974 79127
2075976 1862053 328186 242303 85883
(48085) (41329) (48085) (41329) (6756)
2091487 1885842 343697 266092 77605
(15511) (23789) (15511) (23789) (-8278)
2091879 1918592 344089 298842 45247
(392) (32750) (392) (32750) (-32358)
有什么方法可以做同样的事情吗?
将
diff
的输出转换为字符串,添加括号和concat
回到原来的状态,最后sort_index
按顺序重新组织行:
tmp = (df.drop(columns='timestamp').diff()
.iloc[1:]
.apply(lambda s: '('+s.astype(str)+')')
)
out = pd.concat([df, tmp]).sort_index()
输出:
timestamp Puts Calls PutCh CallCh ChDiff
0 09:41:12 AM 2027891 1820724 280101 200974 79127
1 09:48:51 AM 2075976 1862053 328186 242303 85883
1 NaN (48085.0) (41329.0) (48085.0) (41329.0) (6756.0)
2 09:58:48 AM 2091487 1885842 343697 266092 77605
2 NaN (15511.0) (23789.0) (15511.0) (23789.0) (-8278.0)
3 10:08:21 AM 2091879 1918592 344089 298842 45247
3 NaN (392.0) (32750.0) (392.0) (32750.0) (-32358.0)
4 02:26:00 PM 1995234 1941917 247444 322167 -74723
4 NaN (-96645.0) (23325.0) (-96645.0) (23325.0) (-119970.0)
5 02:44:36 PM 1990071 1934874 242281 315124 -72843
5 NaN (-5163.0) (-7043.0) (-5163.0) (-7043.0) (1880.0)
6 02:56:17 PM 1970892 1938472 223102 318722 -95620
6 NaN (-19179.0) (3598.0) (-19179.0) (3598.0) (-22777.0)