弹性搜索 - 多个字段作为Spark中的映射ID

Question

我对弹性搜索很新。我正在使用elasticsearch-hadoop 6.2.4版本，我正在从HDFS读取文件，转换为bean对象并写入弹性搜索。我正在使用Spark Structured流媒体。

StreamingQuery query = dataSet
                        .writeStream()
                        .format("org.elasticsearch.spark.sql")
                        //.outputMode(OutputMode.Append())
                        .option("checkpointLocation", "\tmp\ckpt1")
                        .option("es.nodes","abc.dev.cm.par.xy.hp")
                        .option("es.port","9200")
                        .option("es.mapping.id", "CustomerID")
                        .option("es.resource", "testIndex/testType")
                        .start();

在写作时，我将在pojo类中给出一个字段（CustomerID）作为映射iD。我们可以将多个字段或字段组合作为映射ID吗？例如，我的文件包含客户ID和订单ID字段。我们可以将这两个字段组合为CustomerID + OrderID吗？

Answer 1

不，您不能将多个属性设置为“es.mapping.id”。你可以做的一件事是，你想要什么样的复合ID，创建它并将它附加到Dataframe并使用它。

Answer 2

根据Elastic Documentation; mapping id选项取1列名，所以;您不能将多列设置为id。但您可以通过使用此值创建一个新列来解决此问题，如下所示：

dataSet.withColumn('id', CustomerID + OrderID)

弹性搜索 - 多个字段作为Spark中的映射ID

问题描述投票：1回答：2

2个回答

最新问题

弹性搜索 - 多个字段作为Spark中的映射ID

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2