动态索引名称

Question

我目前正在利用 AWS Glue ETL 作业将数据从 S3 数据源传输到 OpenSearch。流程进展顺利；然而，我在将数据填充到多个 OpenSearch 索引中面临着挑战。具体来说，我需要索引名称与源 JSON 文件中的字段值之一相对应。有人可以帮我解决这个问题吗？谢谢！

Answer 1

为了解决您的挑战，您可以根据索引名称字段值对数据进行分区，并利用批量插入来获得更好的性能。另外，在写入过程中不要忘记启用压缩，以提高写入效率。我建议你直接使用SDK。

如果您想使用连接器，您可以：

    split_dataframes = {}
    for row in df.select("index").distinct().collect():
        col_value = row["index"]
        split_dataframes[col_value] = df.filter(df["index"] == col_value)
    
    # Write to opensearch
    for key, value in split_dataframes.items():
        glueContext.write_dynamic_frame.from_options(
           frame = value,
           connection_type = "opensearch",
           connection_options = {
              "opensearch.resource": key,
              "connectionName": "Opensearch connection"
            }
        )

动态索引名称

问题描述投票：0回答：1

1个回答

最新问题

动态索引名称

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1