我在我的算法中计算了这个PySpark Dataframe:
+------+--------------------+
| A | b |
+------+--------------------+
| 1|1.000540895285929161|
| 2|1.097289726627339219|
| 3|0.963925596369865420|
| 4|0.400642772674179290|
| 5|1.136213095583983134|
| 6|1.563124989279187345|
| 7|0.924395764582530139|
| 8|0.833237679638091343|
| 9|1.381905515925928345|
| 10|1.315542676739417356|
| 11|0.496544353345593242|
| 12|1.075150956754565637|
| 13|0.912020266273109506|
| 14|0.445620998720738948|
| 15|1.440258342829831504|
| 16|0.929157554709733613|
| 17|1.168496273549324876|
| 18|0.836936489952743701|
| 19|0.629466356196215569|
| 20|1.145973619225162914|
| 21|0.987205342817734242|
| 22|1.442075381077187609|
| 23|0.958558287841447591|
| 24|0.924638906376455542|
+------+--------------------+
我需要计算一个名为F的新列,作为一种递归计算:
F(I) = F(I- 1) * 0.25
+ b(I+ 1) * 0.50 + b(I) * 0.25
当I
是行索引时,仅I= 1
的值为F(1)
:
f(i) = b(i) * 0.25
+ b(i+ 1) * 0.50 + b(i) * 0.25
我应该如何计算?我应该使用滞后和导联功能吗?