我需要提高模型的准确性。所以我尝试使用 TabNet。 我将训练和测试数据附加在谷歌驱动器链接中
链接:https://drive.google.com/drive/folders/1ronho26m9uX9_ooBTh0M81ox1a43ika8?usp=sharing
这是我的代码。
import tensorflow as tf
import pandas as pd
# Load the train and test data into pandas dataframes
train_df = pd.read_csv("train.csv")
#train_df1 = pd.read_csv("train.csv")
test_df = pd.read_csv("test.csv")
# Split the target variable and the features
train_labels = train_df[[f'F_{i}' for i in range(40)]]
#train_labels=trai
test_labels = train_df.target
# Convert the dataframes to tensors
train_dataset = tf.data.Dataset.from_tensor_slices((train_df.values, train_labels.values))
test_dataset = tf.data.Dataset.from_tensor_slices((test_df.values, test_labels.values))
# Define the model using the TabNet architecture
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(train_df.shape[1],)),
tf.keras.layers.Dense(32, activation="relu"),
tf.keras.layers.Dense(64, activation="relu"),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(1)
])
# Compile the model with a mean squared error loss function and the Adam optimizer
model.compile(loss="mean_squared_error", optimizer="adam")
# Train the model on the training data
model.fit(train_dataset.batch(32), epochs=5)
# Make predictions on the test data
predictions = model.predict(test_dataset.batch(32))
#predictions = model.predict(test_dataset)
# Evaluate the model on the test data
mse = tf.keras.losses.mean_squared_error(test_labels, predictions)
print("Mean Squared Error:", mse.numpy().mean())
这是错误:
ValueError Traceback (most recent call last)
<ipython-input-40-87712e1604a9> in <module>
24
25 # Make predictions on the test data
---> 26 predictions = model.predict(test_dataset.batch(32))
27 #predictions = model.predict(test_dataset)
28
1 frames
/usr/local/lib/python3.8/dist-packages/keras/engine/training.py in tf__predict_function(iterator)
13 try:
14 do_return = True
---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False
ValueError: in user code:
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1845, in predict_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1834, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1823, in run_step **
outputs = model.predict_step(data)
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1791, in predict_step
return self(x, training=False)
File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.8/dist-packages/keras/engine/input_spec.py", line 264, in assert_input_compatibility
raise ValueError(f'Input {input_index} of layer "{layer_name}" is '
ValueError: Input 0 of layer "sequential_6" is incompatible with the layer: expected shape=(None, 42), found shape=(None, 41)
如何解决这个问题?
根据可用数据,数据集分割不正确。 test_dataset 中也没有可用的标签。所以两个数据集都有不同的形状维度。
请执行以下更改以运行代码:
将训练数据集拆分为特征和标签
train_data = train_df.iloc[:, :-1] # till 41 column
train_labels = train_df.iloc[:,-1] # Selecting the last 42 column for the target
train_data.shape, train_labels.shape
输出:
((10000, 41), (10000,))
将测试数据集拆分为特征和标签
test_data= test_df # only fetaures available , hence no changes required here
test_data.shape
输出:
(10000, 41)
(附上复制的要点供您参考)。