如何从python生成器获取数据

问题描述 投票:0回答:1

这是一个使用关键字yield的函数。

我想从函数中获取真实数据。

我怎样才能做到这一点?

"""
    # function to reshape features into (samples, time steps, features)

    Only sequences that meet the window-length are considered, no padding is used.
    This means for testing we need to drop those which are below the window-length.
    An alternative would be to pad sequences so that we can use shorter ones
"""
def gen_sequence(samples, seq_length, seq_cols):
    # for one id I put all the rows in a single matrix
    data_matrix = samples[seq_cols].values
    num_elements = data_matrix.shape[0]
    # Iterate over two lists in parallel.
    # For example id1 have 192 rows and sequence_length is equal to 50
    # so zip iterate over two following list of numbers (0,112),(50,192)
    # 0 50 -> from row 0 to row 50
    # 1 51 -> from row 1 to row 51
    # 2 52 -> from row 2 to row 52
    # ...
    # 111 191 -> from row 111 to 191

    for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
        yield data_matrix[start:stop, :]

这就是我做的,但我只得到一份[]的清单

  # samples, seq_length, seq_cols
    # generator for the sequences
    seq_gen = []
    for serial_number in hdd['serial_number'].unique():
        temp = gen_sequence(hdd[hdd['serial_number']==serial_number], sequence_length, sequence_cols)
        print(type(temp))
        seq_gen.append(list(temp))
    # print(seq_gen)

dataframe hdd的例子

  date serial_number      ...       smart_197_raw  smart_198_raw
15    2018-01-01      S30075JX      ...                   0              0
509   2018-01-02      S30075JX      ...                   0              0
1000  2018-01-03      S30075JX      ...                   0              0
1488  2018-01-04      S30075JX      ...                   0              0
1975  2018-01-05      S30075JX      ...                   0              0

[5行x 16列]

hdd.columns:

    'date','capacity_bytes','serial_number','model','failure','smart_5_raw','smart_197_raw','smart_187_raw',
                'smart_7_raw','smart_1_raw','smart_3_raw','smart_9_raw','smart_194_raw','smart_189_raw',
                'smart_188_raw','smart_198_raw'

temp_samples = hdd[hdd['serial_number']==serial_number]

print(temp_samples.shape)的结果是这样的:

(90, 16)
(90, 16)
(2, 16)
(90, 16)
(90, 16)
(90, 16)
(61, 16)
(89, 16)
(90, 16)
(89, 16)
(89, 16)
(13, 16)
(40, 16)
(36, 16)
(90, 16)
(90, 16)
(32, 16)
(90, 16)
(90, 16)
(68, 16)
(90, 16)
(57, 16)
(7, 16)
(4, 16)
(90, 16)
(90, 16)
(27, 16)
(90, 16)
(90, 16)
(50, 16)
(35, 16)
(90, 16)
(89, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(22, 16)
(49, 16)
(90, 16)
(90, 16)
(90, 16)
(88, 16)
(90, 16)
(90, 16)
(88, 16)
(44, 16)
(90, 16)
(90, 16)
(90, 16)
(89, 16)
(90, 16)
(90, 16)
(16, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(90, 16)
(86, 16)
(90, 16)
(24, 16)
(76, 16)
(36, 16)
(90, 16)
(83, 16)
(66, 16)
(50, 16)
(90, 16)
(90, 16)
(90, 16)
(73, 16)
(90, 16)
(52, 16)
(3, 16)
(90, 16)
(6, 16)
(23, 16)
(43, 16)
(42, 16)
(52, 16)
(25, 16)
(20, 16)
(11, 16)
(52, 16)
(83, 16)
(8, 16)
(34, 16)
(90, 16)
(64, 16)
(52, 16)
(90, 16)
(52, 16)
(71, 16)
(90, 16)
(28, 16)
(37, 16)
(15, 16)
(88, 16)
(90, 16)
(90, 16)
(80, 16)
(90, 16)
(26, 16)
(90, 16)
(89, 16)
(90, 16)
(90, 16)
(90, 16)
(3, 16)
(90, 16)
(90, 16)
(82, 16)
(90, 16)
(37, 16)
(90, 16)
(90, 16)
(90, 16)
(68, 16)
(10, 16)
(12, 16)
(90, 16)
(16, 16)
(1, 16)
(43, 16)
(1, 16)
(7, 16)

seq_cols的res:

['smart_187_raw', 'failure', 'smart_5_raw', 'smart_197_raw', 'smart_194_raw', 'capacity_bytes', 'smart_7_raw', 'smart_3_raw', 'smart_189_raw', 'smart_198_raw', 'smart_9_raw', 'smart_188_raw', 'smart_1_raw']

seq_length的值是90

python pandas dataframe generator
1个回答
1
投票

如果要从生成器获取完整数据(不是通过它迭代值),可以将其转换为列表。

改变这一行:

temp = gen_sequence(hdd[hdd['serial_number']==serial_number], sequence_length, sequence_cols)

对此:

temp = list(gen_sequence(hdd[hdd['serial_number']==serial_number], sequence_length, sequence_cols))
© www.soinside.com 2019 - 2024. All rights reserved.