我有一个tf.estimator,它适用于连续变量,我想将其扩展为使用分类变量。
考虑如下所示的熊猫数据框:
label | con_col | cat_col
(float 0 or 1) | (float -1 to 1) | (int 0-3)
----------------+-------------------+---------------
0 | 0.123 | 2
0 | 0.456 | 1
1 | -0.123 | 3
1 | -0.123 | 3
0 | 0.123 | 2
[仅为标签和连续变量列(con_col)构建估算器,我构建以下feature_column变量。
feature_cols = [
tf.feature_column.numeric_column('con_col')
]
然后我像这样将其传递给DNNClassifer。
tf.estimator.DNNClassifier(feature_columns=feature_cols ...)
稍后,我将创建一个serving_input_fn()。在此函数中,我还指定了列。这个例程很小,看起来像这样:
def serving_input_fn():
feat_placeholders['con_col'] = tf.placeholder(tf.float32, [None])
return tf.estimator.export.ServingInputReceiver(feat_placeholders.copy(), feat_placeholders)
这有效。但是,当我尝试使用分类列时,我遇到了问题。
因此使用分类列,这部分似乎正常工作。
feature_cols = [
tf.feature_column.sequence_categorical_column_with_identity('cat_col', num_buckets=4))
]
tf.estimator.DNNClassifier(feature_columns=feature_cols ...)
对于serving_input_fn(),我从堆栈跟踪中获得了建议,但是两个建议都失败了。:
def serving_input_fn():
# try #2
# this fails
feat_placeholders['cat_col'] = tf.SequenceCategoricalColumn(categorical_column=tf.IdentityCategoricalColumn(key='cat_col', number_buckets=4,default_value=None))
# try #1
# this also fails
# feat_placeholders['cat_col'] = tf.feature_column.indicator_column(tf.feature_column.sequence_categorical_column_with_identity(column, num_buckets=4))
# try #0
# this fails. Its using the same form for the con_col
# the resulting error gave hints for the above code.
# Note, i'm using this url as a guide. My cat_col is
# is similar to that code samples 'dayofweek' except it
# is not a string.
# https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/feateng/taxifare_tft/trainer/model.py
#feat_placeholders['cat_col'] = tf.placeholder(tf.float32, [None])
return tf.estimator.export.ServingInputReceiver(feat_placeholders.copy(), feat_placeholders)
如果使用尝试#0,这是错误消息。
ValueError: Items of feature_columns must be a <class 'tensorflow.python.feature_column.feature_column_v2.DenseColumn'>. You can wrap a categorical column with an embedding_column or indicator_column. Given: SequenceCategoricalColumn(categorical_column=IdentityCategoricalColumn(key='cat_col', number_buckets=4, default_value=None))
Lak的答案实现
使用Lak的答案作为指导,这对两个要素列均适用。
# This is the list of features we pass as an argument to DNNClassifier
feature_cols = []
# Add the continuous column first
feature_cols.append(tf.feature_column.numeric_column('con_col'))
# Add the categorical column which is wrapped?
# This creates new columns from a single column?
category_feature_cols = [tf.feature_column.categorical_column_with_identity('cat_col', num_buckets=4)]
for c in category_feature_cols:
feat_cols.append(tf.feature_column.indicator_column(c))
# now pass this list to the DNN
tf.estimator.DNNClassifier(feature_columns=feature_cols ...)
def serving_input_fn():
feat_placeholders['con_col'] = tf.placeholder(tf.float32, [None])
feat_placeholders['cat_col'] = tf.placeholder(tf.int64, [None])
发送到DNN之前,您需要包装类别列:
cat_feature_cols = [ tf.feature_column.sequence_categorical_column_with_identity('cat_col', num_buckets=4)) ]
feature_cols = [tf.feature_column.indicator_column(c) for c in cat_feature_cols]
使用指示器列进行一次热编码,或使用嵌入列进行嵌入。