获取 RequestError(400, 'search_phase_execution_exception', 'runtime error') 以获得相似性

Question

我正在尝试使用tensorflow_hub通过Elasticsearch进行语义搜索，但我得到了

RequestError: RequestError(400, 'search_phase_execution_exception', 'runtime error')

。从 search_phase_execution_exception 我认为数据损坏（来自这个堆栈问题）我的文档结构看起来像这样

{
"settings": {
  "number_of_shards": 2,
  "number_of_replicas": 1
},
 "mappings": {
  "dynamic": "true",
  "_source": {
    "enabled": "true"
  },
  "properties": {
        "id": {
            "type":"keyword"
        },
        "title": {
            "type": "text"
        },
        "abstract": {
            "type": "text"
        },
        "abs_emb": {
            "type":"dense_vector",
            "dims":512
        },
        "timestamp": {
            "type":"date"
        }
    }
}
}

我使用

elasticsearch.indices.create

创建一个文档。

es.indices.create(index=index, body='my_document_structure')
res = es.indices.delete(index=index, ignore=[404])
for i in range(100):
  doc = {
    'timestamp': datetime.datetime.utcnow(),
    'id':id[i],
    'title':title[0][i],
    'abstract':abstract[0][i],
    'abs_emb':tf_hub_KerasLayer([abstract[0][i]])[0]
  }
  res = es.index(index=index, body=doc)

对于我的语义搜索，我使用此代码

查询=“石墨烯” 查询向量 = 列表（嵌入（[查询]）[0]）

script_query = {
    "script_score": {
        "query": {"match_all": {}},
        "script": {
            "source": "cosineSimilarity(params.query_vector, doc['abs_emb']) + 1.0",
            "params": {"query_vector": query_vector}
        }
    }
}

response = es.search(
    index=index,
    body={
        "size": 5,
        "query": script_query,
        "_source": {"includes": ["title", "abstract"]}
    }
)

我知道 stackoverflow 和 elsasticsearch 中有一些类似的问题，但我找不到适合我的解决方案。我的猜测是文档结构是错误的，但我无法弄清楚到底是什么。我使用了来自 this 存储库的搜索查询代码。完整的错误信息太长，似乎没有包含太多信息，所以我只分享最后一部分。

~/untitled/elastic/venv/lib/python3.9/site-packages/elasticsearch/connection/base.py in 
_raise_error(self, status_code, raw_data)
320             logger.warning("Undecodable raw error response from server: %s", err)
321 
--> 322         raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
323             status_code, error_message, additional_info
324         )

RequestError: RequestError(400, 'search_phase_execution_exception', 'runtime error')

这是来自 Elasticsearch 服务器的错误。

[2021-04-29T12:43:07,797][WARN ][o.e.c.r.a.DiskThresholdMonitor] 
[asmac.local] high disk watermark [90%] exceeded on 
[w7lUacguTZWH9xc_lyd0kg][asmac.local][/Users/username/elasticsearch- 
7.12.0/data/nodes/0] free: 17.2gb[7.4%], shards will be relocated 
away from this node; currently relocating away shards totalling [0] 
bytes; the node is expected to continue to exceed the high disk 
watermark when these relocations are complete

Answer 1

我认为您遇到了以下问题，您应该将您的查询更新为：

script_query = {
    "script_score": {
        "query": {"match_all": {}},
        "script": {
            "source": "cosineSimilarity(params.query_vector, 'abs_emb') + 1.0",
            "params": {"query_vector": query_vector}
        }
    }
}

还要确保

query_vector

包含浮点数而不是双精度数

Answer 2

在我的例子中，错误是“由以下原因引起：java.lang.ClassCastException：类org.elasticsearch.index.fielddata.ScriptDocValues$Doubles无法转换为类org.elasticsearch.xpack.vect ors.query.VectorScriptDocValues$DenseVectorScriptDocValues"

我的错误是 - 我在开始提取内容之前删除了 ES 索引。具有“type”：“dense_vector”字段的那个。

这导致 ES 没有使用正确的类型来索引密集向量：它们被存储为无用的双精度列表。从这个意义上说，ES 索引“已损坏”：所有“script_score”查询都返回 400。

Answer 3

对我来说，问题是我使用的是

dense_vector

而不是

elastiknn_dense_float_vector

，这仍然是一个悬而未决的问题。我正在将矢量索引转换为使用

dense_vector

代替： https://github.com/alexklibisz/elastiknn/issues/323

Answer 4

我也有类似的问题，因为我使用的是

doc['text_vector']

而不是

'text_vector'

。

添加

json.dumps

后，我发现

'text_vector'

字段不是

dence_vector

，因为此错误消息：

class org.elasticsearch.index.fielddata.ScriptDocValues$Doubles cannot be cast to class org.elasticsearch.xpack.vectors.query.VectorScriptDocValues$DenseVectorScriptDocValues

要修复此错误，我必须创建索引，并将

mappings

字段设置为：

{ "properties": { "text_vector": { "type": "dense_vector", "dims": 3 } } }

dims

是向量的大小（向量中元素的数量）。

用于创建带有字段类型映射的索引的函数。

def create():
    index = 'text_index'
    body = {
        "settings": {},
        "mappings": { "properties": { "text_vector": { "type": "dense_vector", "dims": 3 } } }
    }
    es.indices.create(index=index, body=body)

    click.echo(f"Index {index} is created with settings {json.dumps(body, indent=4)}")

索引任意文本字符串的函数：

def index(input_str):
    # text_embedding = embed([input_str])[0].numpy().tolist()
    text_embedding = [4.2, 3.4, -0.2]

    body = {'text': input_str, 'text_vector': text_embedding}
    
    res = es.index(index='text_index', body=body)
    click.echo(f"Indexed {input_str} with id {res['_id']}")

使用 elasticsearch 对任何文本字符串执行向量搜索的函数：

def search(search_string):
    # search_vector = embed([search_string])[0].numpy().tolist()
    search_vector = [4.2, 3.4, -0.2]

    body = {
        'query': {
            'script_score': {
                'query': {'match_all': {}},
                'script': {
                    'source': "cosineSimilarity(params.query_vector, 'text_vector') + 1.0",
                    'params': {'query_vector': search_vector}
                }
            }
        }
    }
    try:
        res = es.search(index='text_index', body=body)
        click.echo("Search results:")
        for doc in res['hits']['hits']:
            click.echo(f"{doc['_id']} {doc['_score']}: {doc['_source']['text']}")
    except Exception as inst:
        print(type(inst))
        print(json.dumps(inst.args, inden

问题的完整描述：https://github.com/Konard/elastic-search/issues/3 完整源代码：https://github.com/Konard/elastic-search/commit/1df0748dd8e8a37c29e1d128eedf96d074e5a73f

获取 RequestError(400, 'search_phase_execution_exception', 'runtime error') 以获得相似性

问题描述投票：0回答：4

4个回答

最新问题

获取 RequestError(400, 'search_phase_execution_exception', 'runtime error') 以获得相似性

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4