PySpark Dataframe递归列

问题描述 投票:1回答:1

我在我的算法中计算了这个PySpark Dataframe:

+------+--------------------+
| A    |        b           |
+------+--------------------+
|     1|1.000540895285929161|
|     2|1.097289726627339219|
|     3|0.963925596369865420|
|     4|0.400642772674179290|
|     5|1.136213095583983134|
|     6|1.563124989279187345|
|     7|0.924395764582530139|
|     8|0.833237679638091343|
|     9|1.381905515925928345|
|    10|1.315542676739417356|
|    11|0.496544353345593242|
|    12|1.075150956754565637|
|    13|0.912020266273109506|
|    14|0.445620998720738948|
|    15|1.440258342829831504|
|    16|0.929157554709733613|
|    17|1.168496273549324876|
|    18|0.836936489952743701|
|    19|0.629466356196215569|
|    20|1.145973619225162914|
|    21|0.987205342817734242|
|    22|1.442075381077187609|
|    23|0.958558287841447591|
|    24|0.924638906376455542|
+------+--------------------+

我需要计算一个名为F的新列,作为一种递归计算:

F(I) = F(I- 1) * 0.25 
    + b(I+ 1) * 0.50 +  b(I) * 0.25

I是行索引时,仅I= 1的值为F(1)

f(i) = b(i) * 0.25 
    + b(i+ 1) * 0.50 +  b(i) * 0.25

我应该如何计算?我应该使用滞后和导联功能吗?

apache-spark pyspark spark-dataframe
1个回答
© www.soinside.com 2019 - 2024. All rights reserved.