我需要一些帮助。我一整天都在尝试在升级的 langchain 版本下嵌入文档(使用 text-embedding-3-large 模型嵌入)。我无法解决上述问题。我已经尝试了网上看到的所有方法,降级 azure-search-documents、降级 langchain 等等。我收到该错误或另一个错误:
(InvalidRequestParameter) 请求无效。详细信息:定义:矢量字段“content_vector”必须设置属性“vectorSearchProfile”。
代码:无效请求参数
你解决了吗?
这是我当前的设置:
azure-core==1.29.7
azure-搜索文档==11.4.0b8
langchain==0.1.8(0.2.0 也失败)
langchain-core==0.2.1
这是我的代码:
def set_vector_fields():
return [
SimpleField(name="id",type=SearchFieldDataType.String,key=True,filterable=True,),
SearchableField(name="content",type=SearchFieldDataType.String,searchable=True,),
SearchField(name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
vector_search_dimensions=dimensionality,
searchable=True,
vector_search_configuration="hnsw_config"
#vector_search_profile_name = "profile_hnsw_config",
#vectorSearchProfile="profile_hnsw_config"
),
# Additional fields for metadata. Customize as needed based on the structure of the data. See additional footnotes for details
SearchableField(name="metadata", type=SearchFieldDataType.String, searchable=True,filterable=True,),
SearchableField(name="id_embedding",type=SearchFieldDataType.String, searchable=True,filterable=True,),
SimpleField(name="last_update",type=SearchFieldDataType.DateTimeOffset,searchable=True,filterable=True,),
SearchableField(name="chunk_no",type=SearchFieldDataType.Double,searchable=True,filterable=True,),
#below lists the additional Metadata Fields addded from the CSV file
SearchableField(name="Filename",type=SearchFieldDataType.String,searchable=True,filterable=True,),
SearchableField(name="Subject",type=SearchFieldDataType.String,searchable=True,filterable=True,),
SearchableField(name="Year",type=SearchFieldDataType.String,searchable=True,filterable=True,),
SearchableField(name="Source",type=SearchFieldDataType.String,searchable=True,filterable=True,),
#Date Fields
SimpleField(name="Date_File",type=SearchFieldDataType.DateTimeOffset,searchable=True,filterable=True,),
SimpleField(name="Last_Update_Embedding",type=SearchFieldDataType.DateTimeOffset,searchable=True,filterable=True,)
]
#def set_vector_search_config_new():
# return VectorSearch(algorithms=[HnswAlgorithmConfiguration(
# name="hnsw_config",
# kind=VectorSearchAlgorithmKind.HNSW,
# parameters=HnswParameters(m=8, metric="cosine", ef_construction=400, ef_search=500))],
# profiles=VectorSearchProfile(name="profile_hnsw_config",algorithm_configuration_name ="hnsw_config" )
# )
def set_vector_search_config():
return VectorSearch(algorithm_configurations=[HnswVectorSearchAlgorithmConfiguration(
name="hnsw_config",
kind="hnsw",
parameters=HnswParameters(m=8, metric="cosine", ef_construction=400, ef_search=500))],
)
当我尝试调用矢量存储(在 Azure AI 搜索中)时,例程失败
def set_vectorstore(index_name):
embeddings,embedding_function = set_embedding_function()
fields= set_vector_fields()
sc_name,scoring_profile = define_scoring_profile()
# NOTE: IF FAILS HERE, WHEN IT ATTEMPTS TO BUILD THE VECTOR.
# I can create the vector, but I can't upload documents via vectorstore.add_documents
vectorstore: AzureSearch = AzureSearch(
azure_search_endpoint=azure_search_endpoint,
azure_search_key=azure_search_key,
index_name=index_name,
embedding_function=embedding_function,
search_type=search_type_GPT,
fields=fields,
scoring_profiles = scoring_profile,
default_scoring_profile = sc_name,
)
return vectorstore
我将不胜感激任何帮助。
谢谢
尝试了网上找到的所有解决方案。
您只需在字段中提供矢量配置文件,并使用配置文件、算法和矢量化器创建矢量搜索配置。
使用下面的代码获取索引。
def set_vector_fields():
return [
SimpleField(name="id",type=SearchFieldDataType.String,key=True,filterable=True,),
SearchableField(name="content",type=SearchFieldDataType.String,searchable=True,),
SearchField(name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
vector_search_dimensions=1536, # 1 - 3072
vector_search_profile_name="profile_hnsw_config"
),
SearchableField(name="metadata", type=SearchFieldDataType.String, searchable=True,filterable=True,),
SearchableField(name="id_embedding",type=SearchFieldDataType.String, searchable=True,filterable=True,),
SimpleField(name="last_update",type=SearchFieldDataType.DateTimeOffset,searchable=True,filterable=True,),
SearchableField(name="chunk_no",type=SearchFieldDataType.Double,searchable=True,filterable=True,),
# Below lists the additional Metadata Fields added from the CSV file
SearchableField(name="Filename",type=SearchFieldDataType.String,searchable=True,filterable=True,),
SearchableField(name="Subject",type=SearchFieldDataType.String,searchable=True,filterable=True,),
SearchableField(name="Year",type=SearchFieldDataType.String,searchable=True,filterable=True,),
SearchableField(name="Source",type=SearchFieldDataType.String,searchable=True,filterable=True,),
# Date Fields
SimpleField(name="Date_File",type=SearchFieldDataType.DateTimeOffset,searchable=True,filterable=True,),
SimpleField(name="Last_Update_Embedding",type=SearchFieldDataType.DateTimeOffset,searchable=True,filterable=True,) ]
在这里,您给出的尺寸范围为 1-3072,因为您使用的是
text-embedding-3-large
。请参阅此了解更多信息。
对于矢量搜索配置,请使用以下代码和矢量化器。
def set_vector_search_config_new():
return VectorSearch(
algorithms=[
HnswAlgorithmConfiguration(
name="hnsw_config",
kind=VectorSearchAlgorithmKind.HNSW,
parameters=HnswParameters(m=8, metric="cosine", ef_construction=400, ef_search=500))],
profiles=[
VectorSearchProfile(name="profile_hnsw_config",algorithm_configuration_name ="hnsw_config" ,vectorizer="myOpenAI")],
vectorizers=[
AzureOpenAIVectorizer(
name="myOpenAI",
kind="azureOpenAI",
azure_open_ai_parameters=AzureOpenAIParameters(
resource_uri=azure_openai_endpoint,
deployment_id=azure_openai_embedding_deployment,
api_key=azure_openai_key,
),
),
]
)
然后使用下面的代码创建索引。
index_client = SearchIndexClient(endpoint=service_endpoint, credential=credential)
index = SearchIndex(name=index_name, fields=set_vector_fields(), vector_search=set_vector_search_config_new())
result = index_client.create_or_update_index(index)
输出:
请参阅此文档了解更多信息。
错误
ExhaustiveKnnAlgorithmConfiguration' from 'azure.search.documents.indexes.models
是由于软件包问题造成的。尝试将 azure-search-documents
软件包更新为 11.6.0b1
。