Pandas/Numpy 中的 for 循环在 bitwise_xor 累加上的主要加速问题

Question

好的，我正在使用如下所示的 for 循环将此数据转换为下面的数据使用异或累加。对于我有（830401）行的条目，这非常非常慢。有没有有什么方法可以加速 pandas 中的这种积累或使用 numpy 然后 assisig 它返回 numpy 数组本身


In [122]: acctable[0:20]
Out[122]: 
    what  dx1  dx2  dx3  dx4  dx5  dx6  dx7  dx8  dx9
0      4    2   10    8    0    5    7    1   13   11
1      4    0    0    0    0    0    0    0    0    0
2      6    0    0    0    0    0    0    0    0    0
3     14    0    0    0    0    0    0    0    0    0
4     12    0    0    0    0    0    0    0    8    0
5      4    0    0    0    0    0    0    0    0    0
6      1    0    0    0    0    0    0    0    0    0

...      ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
830477    15    0    0    0    0    0    0    0    0    0
830478     3    0    0    0    0    0    0    0    0    0
830479    11    0    0    0    0    0    0    0    0    0
830480     9    0    0    0    0    0    0    0    0    0
830481    11    0    0    0    0    0    0    0    0    0

[830482 rows x 10 columns]

这是我尝试过的，它实际上可能需要一整分钟，而且我有更大的数据集需要处理所以任何捷径或最佳方法都会很有帮助：

# Update: Instead of all 800k of 'what', i put the first 5 numbers in rstr so you can see how i'm xor accumulating. You should be able to copy/paste the first 6 elements of the data from with pd.read_clipboard() and assign to acctable. 

In [121]: rstr
Out[121]: array([ 4,  4, 12, 14,  6,  4], dtype=int8)
  
dt = np.int8
rstr = np.array(acctable.loc[:5, ('what')], dtype=dt)
for x in range(4): # # Prime Sequencing Functions
   wuttr = np.bitwise_xor.accumulate(np.r_[[rstr[-(x+1)]], acctable.loc[x, 'what':]], dtype=dt)
   acctable.loc[x+1, "what":] = wuttr[:end]

之后：


In [122]: acctable[0:20]
Out[122]: 
    what  dx1  dx2  dx3  dx4  dx5  dx6  dx7  dx8  dx9
0      4    2   10    8    0    5    7    1   13   11
1      4    0    2    8    0    0    5    2    3   14
2      6    2    2    0    8    8    8   13   15   12
3     14    8   10    8    8    0    8    0   13    2
4     12    2   10    0    8    0    0    8    8    5
5      4    8   10    0    0    8    8    8    0    8
6      1    5   13    7    7    7   15    7   15   15
...      ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
830477    15   15    7    0    0    5    9   14   10    3
830478     3   12    3    4    4    4    1    8    6   12
830479    11    8    4    7    3    7    3    2   10   12
830480     9    2   10   14    9   10   13   14   12    6
830481    11    2    0   10    4   13    7   10    4    8

[830482 rows x 10 columns]

这是一个简单的累加，但您需要前一行才能继续累加，而我能做的唯一方法是使用 for 循环。另外，“rstr”变量实际上是“什么”列。

谢谢！

Answer 1

您可以尝试numba来加快计算速度：

from numba import njit


@njit
def do_work(vals, row):
    for i in range(len(vals[0]) - 1):
        vals[row + 1, i + 1] = vals[row, i] ^ vals[row + 1, i]


vals = df.values
for row in range(len(df) - 1):
    do_work(vals, row)

print(df)

打印：

   what  dx1  dx2  dx3  dx4  dx5  dx6  dx7  dx8  dx9
0     4    2   10    8    0    5    7    1   13   11
1     4    0    2    8    0    0    5    2    3   14
2     6    2    2    0    8    8    8   13   15   12
3    14    8   10    8    8    0    8    0   13    2
4    12    2   10    0    8    0    0    8    8    5

初始

df

：

   what  dx1  dx2  dx3  dx4  dx5  dx6  dx7  dx8  dx9
0     4    2   10    8    0    5    7    1   13   11
1     4    0    0    0    0    0    0    0    0    0
2     6    0    0    0    0    0    0    0    0    0
3    14    0    0    0    0    0    0    0    0    0
4    12    0    0    0    0    0    0    0    0    0

Pandas/Numpy 中的 for 循环在 bitwise_xor 累加上的主要加速问题

问题描述投票：0回答：1

1个回答

最新问题

Pandas/Numpy 中的 for 循环在 bitwise_xor 累加上的主要加速问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1