将字符串numpy.ndarray转换为float numpy.ndarray

Question

我有一个问题。我怎样才能转换：

import numpy as np

a = np.array([['0.1 0.2 0.3'], ['0.3 0.4 0.5'], ['0.5 0.6 0.7']])

至：

b = np.array([[0.1,0.2,0.3], [0.3,0.4,0.5], [0.5,0.6,0.7]])

Answer 1

Here is a possible approach:

import numpy as np
a = np.array([['0.1 0.2 0.3'], ['0.3 0.4 0.5'], ['0.5 0.6 0.7']])

# Create a placeholder list
b = []

for element in a:
  # use a list comprehension to
  #     * take the zeroeth element in each row of the 'a' array and
  #       split the string on spaces
  #     * parse through each substring thus produced
  #     * convert each of those substrings into floats
  #     * store it in the list called temp.

  temp = [float(num) for num in element[0].split()]

  # Add each temp list to the parent list 'b'
  b.append(temp)

# Convert b into an np.array
b = np.array(b)

Without the comments

这看起来像这样：

b = []

for element in a:
    temp = [float(num) for num in element[0].split(' ')]
    b.append(temp)
b = np.array(b)

Yields:

array([[0.1, 0.2, 0.3],
       [0.3, 0.4, 0.5],
       [0.5, 0.6, 0.7]])

An alternate approach:

我倾向于将此视为一种方法，因为它使用了numpy的原生投射能力。我还没有对它进行测试，但如果在大型阵列的转换过程中产生加速，我不会感到惊讶。

# transform 'a' to an array of rows full of individual strings
# use the .astype() method to then cast each value as a float
a = np.array([row[0].split() for row in a])
b = a.astype(np.float)

Hatib Taha @ Hamdisif

Answer 2

我将这个答案留给那些正在寻找矢量化NumPy方法的人参考。 TL; DR：它不快，在np.array([row[0].split() for row in a], dtype=float)中使用the accepted answer。

我正在寻找一个解决这个问题的矢量化方法，并提出了以下解决方案。

使用np.char.split：

import numpy as np


def to_numeric1(array, sep=' ', dtype=np.float):
    """
    Converts an array of strings with delimiters in it 
    to an array of specified type
    """
    split = np.char.split(array, sep=sep)
    without_lists = np.array(split.tolist())
    corrected_dimension = np.squeeze(without_lists)
    return corrected_dimension.astype(dtype)

并使用pd.Series.str.split：

import pandas as pd


def by_pandas(array, sep=' ', dtype=np.float):
    df = pd.DataFrame(array)
    return df[0].str.split(pat=sep, expand=True).to_numpy(dtype=dtype)

不幸的是，两种解决方案都比E. Ducateme's answer中的本机Python循环慢：

a = np.array([['0.1 0.2 0.3'], ['0.3 0.4 0.5'], ['0.5 0.6 0.7']]*10000)

%%timeit
native_python_loop(a)
# 57.8 ms ± 526 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
to_numeric1(a)
# 86.6 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
to_numeric2(a)
# 79.8 ms ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

如comment by hpaulj所述：

np.char函数将Python字符串方法应用于数组的每个元素。它们很方便，但它们并没有提高速度。 NumPy没有对字符串内容进行操作的快速编译代码。这取决于现有的Python代码。字符串不存在常见数字意义上的“矢量化”。

理想情况下，第一个解决方案可以与本机Python循环一样快，并且代码行数较少。问题是np.char.split的返回值：

>>> a = np.array([['0.1 0.2 0.3'], ['0.3 0.4 0.5'], ['0.5 0.6 0.7']])
>>> np.char.split(a)
array([[list(['0.1', '0.2', '0.3'])],
       [list(['0.3', '0.4', '0.5'])],
       [list(['0.5', '0.6', '0.7'])]], dtype=object)

它返回NumPy字符串列表数组的NumPy数组，应该进一步处理为普通的2D NumPy数组，我假设这个处理需要很多时间。作为hpaulj said：“[i.split() for i in a]和np.char.split(a)采取基本相同的时间”

有一个issue on GitHub建议对此函数进行更改，因此它将返回以下内容：

array([['0.1', '0.2', '0.3'],
       ['0.3', '0.4', '0.5'],
       ['0.5', '0.6', '0.7']], dtype='<U3')

Answer 3

b = []
for ai in a:
  temp=[]
  for b in ai[0].split(' '):
     temp.append(float(b))
  b.append(temp)

b = np.array(b)

您遍历所有字符串，将它们拆分为空格，并将它们强制转换为浮点数

Answer 4

您可以使用嵌套列表然后重新整形它们。

b = [ float(h) for j in [i[0].split(" ") for i in a  ]for h in j ]
b = np.asarray(b).reshape(3,3)

希望这可以帮助。

@E。 Ducateme解决方案也非常紧凑。

Answer 5

首先，您将通过将其吐入浮点字符串映射数组中的每个项目然后应用x.astype(np.float)函数转换为浮点数

import  numpy as np

x = np.array([['0.1 0.2 0.3'], ['0.3 0.4 0.5'], ['0.5 0.6 0.7']])    
x = np.array(list(map(lambda z: z[0].split(),x)))
y = x.astype(np.float)
print(y)

结果：

[[0.1 0.2 0.3]
 [0.3 0.4 0.5]
 [0.5 0.6 0.7]]

将字符串numpy.ndarray转换为float numpy.ndarray

问题描述投票：-3回答：4

4个回答

Here is a possible approach:

Without the comments

Yields:

An alternate approach:

最新问题

将字符串numpy.ndarray转换为float numpy.ndarray

问题描述 投票：-3回答：4

4个回答

Here is a possible approach:

Without the comments

Yields:

An alternate approach:

最新问题

问题描述投票：-3回答：4