谷歌colabe出错! ValueError:未发现丢失。您可能忘记在“compile()”方法中提供“loss”参数

问题描述 投票:0回答:1

我是编程新手,我现在才刚刚开始学习,而且我正在使用各种免费工具来完成它,所以我对编程还不太了解

我正在尝试编写一个用于自学习的神经网络

含义如下:我有3个文件。在第一个(类别)中,有 1 列,包含 37 个值,列名称为类别

第二个(例如)有 2 列。第一列称为categ,包含 785 行。第二列称为“fix”,包含 785 行

在第三个文件(match)中,名为 match 的 1 列包含 3543 行。

我需要匹配文件来获取第二列,并根据 Excel 文件中的数据将类别文件中的值添加到其每个值。

目前,我有这个代码

import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras.utils import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model

# Reading downloaded Excel files
# File with categories
from google.colab import files
upload = files.upload()
!ls
df_categories = pd.read_excel(open('categ.xlsx', 'rb'))
df_categories = pd.read_excel('categ.xlsx', index_col=None)
print(df_categories.columns)

# File with examples
from google.colab import files
upload = files.upload()
!ls
df_examples = pd.read_excel(open('ex.xlsx', 'rb'))
df_examples = pd.read_excel('ex.xlsx', index_col=None)
print(df_examples.columns)

# File with values for distribution
from google.colab import files
upload = files.upload()
!ls
df_to_distribute = pd.read_excel(open('match.xlsx', 'rb'))
df_to_distribute = pd.read_excel('match.xlsx', index_col=None)
print(df_to_distribute.columns)

# Data preprocessing

categories = df_categories['categ'].tolist()
values = df_examples['fix'].tolist()
to_distribute = df_to_distribute['match'].tolist()

categories = [str(category) for category in categories]
values = [str(value) for value in values]
to_distribute = [str(item) for item in to_distribute]

tokenizer = Tokenizer()
tokenizer.fit_on_texts(categories + values + to_distribute)

tokenizer = Tokenizer()
tokenizer.fit_on_texts(categories + values + to_distribute)
category_sequences = tokenizer.texts_to_sequences(categories)
value_sequences = tokenizer.texts_to_sequences(values)
to_distribute_sequences = tokenizer.texts_to_sequences(to_distribute)

max_length = max(len(seq) for seq in category_sequences + value_sequences + to_distribute_sequences)
padded_category_sequences = pad_sequences(category_sequences, maxlen=max_length, padding='post')
padded_value_sequences = pad_sequences(value_sequences, maxlen=max_length, padding='post')
padded_to_distribute_sequences = pad_sequences(to_distribute_sequences, maxlen=max_length, padding='post')

# Creating a model
input_layer = Input(shape=(max_length,))
embedding_layer = Embedding(input_dim=len(tokenizer.word_index) + 1, output_dim=64)(input_layer)
lstm_layer = LSTM(64)(embedding_layer)
output_layer = Dense(36, activation='softmax')(lstm_layer)

model = Model(inputs=input_layer, outputs=output_layer)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Model training
#model.fit(padded_to_distribute_sequences, padded_category_sequences, epochs=10, batch_size=32, validation_split=0.2)
#model.fit(np.array(data_list), np.array(y), verbose=0, epochs=100)
model.fit(np.array(padded_to_distribute_sequences), np.array(padded_category_sequences), verbose=0, epochs=100)

目前我收到以下错误,但我不知道如何修复它

ValueError                                Traceback (most recent call last)
<ipython-input-18-4e982bc70a7f> in <cell line: 37>()
     35 #model.fit(padded_to_distribute_sequences, padded_category_sequences, epochs=10, batch_size=32, validation_split=0.2)
     36 #model.fit(np.array(data_list), np.array(y), verbose=0, epochs=100)
---> 37 model.fit(np.array(padded_to_distribute_sequences), np.array(padded_category_sequences), verbose=0, epochs=100)

1 frames
/usr/local/lib/python3.10/dist-packages/keras/src/engine/data_adapter.py in _check_data_cardinality(data)
   1958             )
   1959         msg += "Make sure all arrays contain the same number of samples."
-> 1960         raise ValueError(msg)
   1961 
   1962 

ValueError: Data cardinality is ambiguous:
  x sizes: 3549
  y sizes: 36
Make sure all arrays contain the same number of sample

我尝试根据网站和论坛的建议更改代码行,但还没有帮助。我会很高兴得到你的帮助!

我正在 Google colab 中编写代码

不幸的是,我无法丢弃我使用的原始文件,因为它们包含个人数据,但我可以分享一个简短的摘要,以便我的行为逻辑清晰。我把它附在描述的最后

我使用的文件示例

tensorflow opencv keras deep-learning conv-neural-network
1个回答
0
投票

我认为问题在于目标数据

padded_category_sequences
和输入数据
padded_to_distribute_sequences
具有不同的样本数量,这导致了
ValueError

在“数据处理”之后添加:-

target_data = np.tile(padded_category_sequences, (len(padded_to_distribute_sequences) // len(padded_category_sequences), 1))

我假设 padded_category_sequences 是您的目标数据

© www.soinside.com 2019 - 2024. All rights reserved.