[监督分类]我正在尝试使用张量流和keras训练具有许多不同分类数据的模型。我无法使用“一键编码”,因为有数百种不同的值。因此,我尝试创建一个feature_columncategorical_column_with_hash_bucket
,然后将其变成feature_column.embedding_column
因此,我的数据中的字符串值被转换为整数,然后转换为3维浮点向量。训练时出现错误
ValueError: in converted code:
relative to C:\Users\kremer\Anaconda3\lib\site-packages\tensorflow\python\feature_column:
feature_column_v2.py:474 call
self._state_manager)
feature_column_v2.py:3121 get_dense_tensor
transformation_cache, state_manager)
feature_column_v2.py:3488 get_sparse_tensors
transformation_cache.get(self, state_manager), None)
feature_column_v2.py:2562 get
transformed = column.transform_feature(self, state_manager)
feature_column_v2.py:3466 transform_feature
return self._transform_input_tensor(input_tensor)
feature_column_v2.py:3444 _transform_input_tensor
prefix='column_name: {} input_tensor'.format(self.key))
utils.py:58 assert_string_or_int
'{} dtype must be string or integer. dtype: {}.'.format(prefix, dtype))
ValueError: column_name: Artikel input_tensor dtype must be string or integer. dtype: <dtype: 'float32'>.
这是我的代码:
#defining feature columns:
feature_columns = []
# numeric cols
for header in ['POS', 'DAUER_RUEST', 'UNTERBRECHUNGEN_RUEST', 'DAUER_PROD', 'UNTERBRECHUNGEN_PROD', 'GUTTEILE', 'Teile_Soll', 'Stueckzeit', 'Ruestzeit_Soll']:
feature_columns.append(feature_column.numeric_column(header))
# categorical columns with embedding
artikel = feature_column.categorical_column_with_hash_bucket(key='Artikel' , hash_bucket_size=600, dtype=tf.dtypes.string)
artikel_embedding = feature_column.embedding_column(artikel, dimension=3)
feature_columns.append(artikel_embedding)
batchnumber = feature_column.categorical_column_with_hash_bucket(key='BA' , hash_bucket_size=600, dtype=tf.dtypes.string)
batchnumber_embedding = feature_column.embedding_column(batchnumber, dimension=3)
feature_columns.append(batchnumber_embedding)
...
#five embedding columns with this design in total
...
#building and training the model
model = tf.keras.Sequential()
model.add(feature_layer)
model.add(layers.Dense(28, activation='relu'))
model.add(layers.Dense(28, activation='relu'))
model.add(layers.Dense(1))
early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
early_stopping = tf.keras.callbacks.EarlyStopping(patience=3)
model.fit(train_ds,
validation_data=val_ds,
epochs=5,
callbacks=[early_stopping],
verbose = 1,
)
更改
[artikel = feature_column.categorical_column_with_hash_bucket(key='Artikel' , hash_bucket_size=600, dtype=tf.dtypes.string)
至
artikel = feature_column.categorical_column_with_hash_bucket(key='Artikel' , hash_bucket_size=600, dtype=tf.dtypes.float)
因为您在categorical_column_with_hash_bucket中将artikel定义为string
。一直以来,我对Keras都不熟悉,我认为在model.fit
中,artikel
中的train_ds
是float的实例。在tensorflow估计器中,tf.estimator.TrainSpec
input_fn需要特定的dtype。