有没有办法在tf.data.Dataset w / tf.py_func中传递字典?

问题描述 投票:0回答:1

我在数据处理中使用tf.data.Dataset,我想用tf.py_func应用一些python代码。

顺便说一句,我发现在tf.py_func中,我无法返回字典。有没有办法做到这一点或解决方法?

我的代码如下所示

def map_func(images, labels):
    """mapping python function"""
    # do something
    # cannot be expressed as a tensor graph
    return {
        'images': images,
        'labels': labels,
        'new_key': new_value}
def tf_py_func(images, labels):
    return tf.py_func(map_func, [images, labels], [tf.uint8, tf.string], name='blah')

return dataset.map(tf_py_func)

===========================================================================

已经有一段时间了,我忘了我问了这个问题。我以其他方式解决了它,这很容易,我觉得我几乎是一个愚蠢的人。问题是:

  1. tf.py_func无法返回字典。
  2. dataset.map可以返回字典。

答案是:地图两次。

def map_func(images, labels):
    """mapping python function"""
    # do something
    # cannot be expressed as a tensor graph
    return processed_images, processed_labels

def tf_py_func(images, labels):
    return tf.py_func(map_func, [images, labels], [tf.uint8, tf.string], name='blah')

def _to_dict(images, labels):
    return { 'images': images, 'labels': labels }

return dataset.map(tf_py_func).map(_to_dict)
python tensorflow tensorflow-datasets
1个回答
1
投票

您可以将字典转换为您返回的字符串,然后将其拆分为字典。

这看起来像这样:

return (images + " " + labels + " " + new value)

然后在你的其他功能:

l = map_func(image, label).split(" ")
d['images'] = l[0]
d[
...

0
投票

我也在努力解决这个问题(我想使用非TF函数预处理文本数据,但仍将所有内容保存在Tensorflow的数据集对象的保护之下)。事实上,没有必要采用双map()解决方法;一个人必须在处理每个例子时只嵌入Python函数。

这是完整的示例代码;同时在colab上测试(前两行用于安装依赖项)。

!pip install tensorflow-gpu==2.0.0b1
!pip install tensorflow-datasets==1.0.2

from typing import Dict

import tensorflow as tf
import tensorflow_datasets as tfds

# Get a textual dataset using the 'tensorflow_datasets' library
dataset_builder = tfds.text.IMDBReviews()
dataset_builder.download_and_prepare()

# Do not randomly shuffle examples for demonstration purposes
ds = dataset_builder.as_dataset(shuffle_files=False)
training_ds = ds[tfds.Split.TRAIN]

print(training_ds)
# <_OptionsDataset shapes: {text: (), label: ()}, types: {text: tf.string, 
# label: tf.int64}>

# Print the first training example
for example in training_ds.take(1):
    print(example['text'])
    # tf.Tensor(b"As a lifelong fan of Dickens, I have ... realised.",
    # shape=(), dtype=string)

# some global configuration or object which we want to access in the
# processing function
we_want_upper_case = True


def process_string(t: tf.Tensor) -> str:
    # This function must have been called as tf.py_function which means
    # it's always eagerly executed and we can access the .numpy() content
    string_content = t.numpy().decode('utf-8')

    # Now we can do what we want in Python, i.e. upper-case or lower-case
    # depending on the external parameter.
    # Note that 'we_want_upper_case' is a variable defined in the outer scope
    # of the function! We cannot pass non-Tensor objects as parameters here.
    if we_want_upper_case:
        return string_content.upper()
    else:
        return string_content.lower()


def process_example(example: Dict[str, tf.Tensor]) -> Dict[str, tf.Tensor]:
    # I'm using typing (Dict, etc.) just for clarity, it's not necessary

    result = {}
    # First, simply copy all the tensor values
    for key in example:
        result[key] = tf.identity(example[key])

    # Now let's process the 'text' Tensor.
    # Call the 'process_string' function as 'tf.py_function'. Make sure the
    # output type matches the 'Tout' parameter (string and tf.string).
    # The inputs must be in a list: here we pass the string-typed Tensor 'text'.
    result['text'] = tf.py_function(func=process_string,
                                    inp=[example['text']],
                                    Tout=tf.string)
    return result


# We can call the 'map' function which consumes and produces dictionaries
training_ds = training_ds.map(lambda x: process_example(x))

for example in training_ds.take(1):
    print(example['text'])
    # tf.Tensor(b"AS A LIFELONG FAN OF DICKENS, I HAVE ...  REALISED.",
    # shape=(), dtype=string)
© www.soinside.com 2019 - 2024. All rights reserved.