在Tensorflow中,已经存在通过交叉列tf.feature_column.crossed_column
来创建特征的功能,但它更适用于类别数据。数字数据怎么样?
例如,已经有2列
age = tf.feature_column.numeric_column("age")
education_num = tf.feature_column.numeric_column("education_num")
如果我想基于age和education_num创建第三和第四个特征列,就像这样
my_feature = age * education_num
my_another_feature = age * age
怎么做到呢?
您可以声明自定义数字列并将其添加到input function中的数据框:
# Existing features
age = tf.feature_column.numeric_column("age")
education_num = tf.feature_column.numeric_column("education_num")
# Declare a custom column just like other columns
my_feature = tf.feature_column.numeric_column("my_feature")
...
# Add to the list of features
feature_columns = { ... age, education_num, my_feature, ... }
...
def input_fn():
df_data = pd.read_csv("input.csv")
df_data = df_data.dropna(how="any", axis=0)
# Manually update the dataframe
df_data["my_feature"] = df_data["age"] * df_data["education_num"]
return tf.estimator.inputs.pandas_input_fn(x=df_data,
y=labels,
batch_size=100,
num_epochs=10)
...
model.train(input_fn=input_fn())