使用tf.data.Dataset在Tensorflow中进行多重热编码

问题描述 投票:1回答:1

我对TF API tf.data.Dataset.from_tensor_slices()有问题

下面的代码运行良好:

features = {'letter': [['A','A'], ['C','D'], ['E','F'], ['G','A'], ['X','R']]}

letter_feature = tf.feature_column.categorical_column_with_vocabulary_list(
                "letter", ["A", "B", "C"], dtype=tf.string)

target = [1,0,1,0,1]

indicator = tf.feature_column.indicator_column(letter_feature)

def make_input_fn (X,y):
    def input_fn():
        return (X,y)
    return input_fn

# THE INPUT FUNCTION WILL RETURN A SET : ( {'letter':[['A','A'],['C','D']...]}, [1,0,...] )

linear_estimator = tf.estimator.LinearClassifier(indicator)
input_fn = make_input_fn(features, target)

linear_estimator.train(input_fn)

这基本上使我可以使用指标feature_column将一列形状(-1,2)填充到估算器模型中。

现在,我遇到以下用例的问题:

df_features = pd.DataFrame.from_dict(features)

######### this is the dataframe features####
#letter
#[A, A, A]
#[B, C, D]
#[B, E, F]
#[B, G, A]
#[B, X, R]

def make_input_fn (X,y):
    def input_fn():
        ds = tf.data.Dataset.from_tensor_slices((dict(X),y))
        ds = ds.shuffle(128)
        return ds
    return input_fn

linear_estimator = tf.estimator.LinearClassifier(indicator)
input_fn = make_input_fn(df_features,target)

linear_estimator.train(input_fn)

我最终收到此错误:


TypeError: Could not build a TypeSpec for 0    [A, A, A]
1    [B, C, D]
2    [B, E, F]
3    [B, G, A]
4    [B, X, R]
Name: letter, dtype: object with type Series ...
TypeError: Expected binary or unicode string, got ['A', 'A', 'A']

这真的很烦人,因为如果我有大数据集,我将需要使用tf.data.Dataset api来供我的估计量以小批量进行训练,并最终分配训练过程。

我将需要一种解决方法来克服此问题,我想到了生成器,但是我不确定如何实现它,但我想确保是否没有其他解决方案

谢谢!

tensorflow tensorflow-datasets tensorflow-estimator
1个回答
0
投票
阐述Richard_wth的评论并指定完整的工作代码以使社区受益。

import pandas as pd import tensorflow as tf features = {'letter': [['A','A'], ['C','D'], ['E','F'], ['G','A'], ['X','R']]} df_features = pd.DataFrame.from_dict(features) ######### this is the dataframe features#### #letter #[A, A, A] #[B, C, D] #[B, E, F] #[B, G, A] #[B, X, R] letter_feature = tf.feature_column.categorical_column_with_vocabulary_list( "letter", ["A", "B", "C"], dtype=tf.string) indicator = tf.feature_column.indicator_column(letter_feature) target = [1,0,1,0,1] def make_input_fn (X,y): def input_fn(): ds = tf.data.Dataset.from_tensor_slices((dict(X), tf.one_hot(y, depth=2))) ds = ds.shuffle(128) return ds return input_fn linear_estimator = tf.estimator.LinearClassifier(indicator) input_fn = make_input_fn(features,target) linear_estimator.train(input_fn, steps=2)

学习愉快!
© www.soinside.com 2019 - 2024. All rights reserved.