向 Milvus Vector DB 加载数据时 Schema 定义不一致

问题描述 投票:0回答:1

我正在尝试将两个 .wav 音频文件加载到矢量数据库 (Milvus),但我不断收到架构不一致错误。

错误解释:

创建集合失败:错误消息表明解析模式字典时出现问题。这是在尝试在 Milvus 中创建集合时发生的。看来 schema 字典的构建或传递给 Milvus 客户端的方式可能存在问题。

插入数据时出错:创建集合失败后,脚本尝试将数据插入集合中。但是,输入数据和定义的架构之间存在不一致。这种不一致可能是由于输入数据与预期模式之间的格式不正确或数据类型不匹配引起的。

成功将 0 个音频文件加载到 Milvus 中:最后,尽管尝试插入数据,但脚本显示没有音频文件成功加载到 Milvus 中。这可能是由于上述错误导致数据插入过程无法成功完成。

这是我的代码:

from pymilvus import MilvusClient, DataType, FieldSchema, CollectionSchema
from pymilvus.exceptions import MilvusException
import librosa
import numpy as np
import os

# Connect to Milvus server
client = MilvusClient(uri="http://localhost:19530")

# Define field schemas
id_field = FieldSchema(name="id", dtype=DataType.VARCHAR, is_primary=True, description="Primary key field")
vector_field = FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=20, description="Feature vector")
# Add another vector field explicitly
another_vector_field = FieldSchema(name="another_vector", dtype=DataType.FLOAT_VECTOR, dim=20, description="Another feature vector")

# Define collection schema with dynamic field enabled
schema = CollectionSchema(fields=[id_field, vector_field, another_vector_field], enable_dynamic_field=True)

# Create collection
collection_name = "audio_collection"
try:
    client.create_collection(collection_name, schema)
    print("Collection created successfully.")
except MilvusException as e:
    print(f"Failed to create collection: {e}")

# Directory containing audio files
audio_dir = "/test-wav"

# Function to extract features from audio files
def extract_features(audio_path):
    y, sr = librosa.load(audio_path)  # Load audio file
    mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=20)  # Extract MFCC features
    mfccs_mean = np.mean(mfccs, axis=1)  # Take mean along each feature to get a fixed-sized vector
    mfccs_mean_normalized = mfccs_mean / np.linalg.norm(mfccs_mean)  # Normalize vector
    return mfccs_mean_normalized.tolist()  # Convert to list for insertion

# Insert audio features into Milvus
audio_files = ["ES2002a.wav", "ES2002b.wav"]  # List of audio files to load
for file_name in audio_files:
    audio_path = os.path.join(audio_dir, file_name)
    features = extract_features(audio_path)
    data = {"id": file_name, "vector": features}
    try:
        client.insert(collection_name=collection_name, data=[data])
        print(f"Successfully inserted data for {file_name}")
    except MilvusException as e:
        print(f"Error inserting data: {e}")

# Check if data is successfully loaded
collection_stats = client.get_collection_stats(collection_name=collection_name)
num_entities = collection_stats["row_count"]
print(f"Successfully loaded {num_entities} audio files into Milvus.")

# Disconnect from Milvus server
client.close()

但我不断收到此错误:


RPC error: [create_collection], <MilvusException: (code=65535, message=strconv.ParseInt: parsing "{'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': 'Primary key field', 'type': <DataType.VARCHAR: 21>, 'is_primary': True, 'auto_id': False}, {'name': 'vector', 'description': 'Feature vector', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 20}}, {'name': 'another_vector', 'description': 'Another feature vector', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 20}}], 'enable_dynamic_field': True}": invalid syntax)>, <Time:{'RPC start': '2024-04-06 00:10:23.547920', 'RPC error': '2024-04-06 00:10:23.549530'}>
Failed to create collection: audio_collection
Failed to create collection: <MilvusException: (code=65535, message=strconv.ParseInt: parsing "{'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': 'Primary key field', 'type': <DataType.VARCHAR: 21>, 'is_primary': True, 'auto_id': False}, {'name': 'vector', 'description': 'Feature vector', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 20}}, {'name': 'another_vector', 'description': 'Another feature vector', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 20}}], 'enable_dynamic_field': True}": invalid syntax)>
RPC error: [insert_rows], <DataNotMatchException: (code=1, message=The Input data type is inconsistent with defined schema, please check it.)>, <Time:{'RPC start': '2024-04-06 00:10:26.571256', 'RPC error': '2024-04-06 00:10:26.573959'}>
Error inserting data: <DataNotMatchException: (code=1, message=The Input data type is inconsistent with defined schema, please check it.)>
RPC error: [insert_rows], <DataNotMatchException: (code=1, message=The Input data type is inconsistent with defined schema, please check it.)>, <Time:{'RPC start': '2024-04-06 00:10:29.246277', 'RPC error': '2024-04-06 00:10:29.248700'}>
Error inserting data: <DataNotMatchException: (code=1, message=The Input data type is inconsistent with defined schema, please check it.)>
Successfully loaded 0 audio files into Milvus.

我正在尝试将 2 个音频文件 (.wav) 加载到 Milvus 数据库,但我不断遇到架构定义问题。

python database vector milvus
1个回答
0
投票

您使用的是哪个 milvus 版本? 2.4之前的版本只支持1个向量场。但错误消息似乎有点令人困惑。

© www.soinside.com 2019 - 2024. All rights reserved.