我需要解析 Json 嵌套文件并使用 Talend Open Studio 提取 csv。目标是将嵌套的 Json 格式转换为表格。
Json 具有以下结构:
它是一个元素数组(在我的具体情况下为金融工具),每个元素都有另一个数组级别,代表每个工具的交易。
在下面的示例中,我们有三个元素(由字段 Isin 和 Description 表示),每个元素可能有一组 transactionDetails。
[{
"transactionDetails": [
{
"tradeDate": "2023-02-13T00:00:00",
"price": "90",
"nominalAmount": 26000000.0
},
{
"tradeDate": "2023-02-13T00:00:00",
"price": "95",
"nominalAmount": 1000000.0
},
{
"tradeDate": "2023-02-13T00:00:00",
"price": "97",
"nominalAmount": 30000000.0
}
],
"Description": "Apple",
"isin": "ISIN1"
},
{
"transactionDetails": [
{
"tradeDate": "2023-02-13T00:00:00",
"price": "88",
"nominalAmount": 27000000.0
},
{
"tradeDate": "2023-02-13T00:00:00",
"price": "99",
"nominalAmount": 1000000.0
},
{
"tradeDate": "2023-02-13T00:00:00",
"price": "96",
"nominalAmount": 24000000.0
}
],
"Description": "Microsoft",
"isin": "ISIN2"
},
{
"Description": "Tesla",
"isin": "ISIN3"
}]
理想的输出应该列出每个 Isin 和引用日期以及所有交易详细信息(每个 Isin 三个)。下表代表了我的意思:
isin | 描述 | 交易日期 | 价格 | 名义金额 |
---|---|---|---|---|
ISIN1 | 苹果 | 2023-02-13T00:00:00 | 90 | 26000000.0 |
ISIN1 | 苹果 | 2023-02-09T00:00:00 | 95 | 1000000.0 |
ISIN1 | 苹果 | 2023-02-13T00:00:00 | 97 | 30000000.0 |
ISIN2 | 微软 | 2023-02-13T00:00:00 | 88 | 27000000.0 |
ISIN2 | 微软 | 2023-02-13T00:00:00 | 99 | 1000000.0 |
ISIN2 | 微软 | 2023-02-13T00:00:00 | 96 | 24000000.0 |
ISIN3 | 特斯拉 | - | - | - |
重要提示:根据示例,并非所有工具都有与之关联的 transactionDetails,但我也需要将它们提取到表中(当然 transactionDetails 字段中为空值)。
我尝试了不同的方法。
1-第一个是创建 Json 元数据,我尝试了以下方法,但从未得到所需的结果,在我的尝试中的屏幕截图中:
用 $[*] 绝对路径表达式
使用 $[].transactionDetails[*] 绝对路径表达式
如您所见,“根值”(Isin、描述)或 transactionDetails 被写入提取字段中的数组,而不是按照我之前的示例作为表进行处理。我在设置绝对或相对路径表达式时犯了错误吗?
2 - 然后我尝试使用 tExtractJsonfields,实际上我成功地做了我想做的事情,但方式非常复杂:
在此示例中,我使用不同的文件,但具有相同的 Json 结构。
这里我有两个相同的 tInputJsonFile,其中“$[*]”作为绝对路径表达式,相对路径表达式如下
- “伊辛”
- “描述”
- “交易详情[*]”
然后在第一个 tExtractJsonFields 中,我提取 Isin,Description 和 transactionDetails 保留为内部有数组的字段。在第二个 tExtractJsonFields 中,我循环 transactionDetails 以提取 tradeDate、价格和nominalAmount(但这只会提取与 transactionDetails 关联的记录,而不是其他记录)。
因此,最后为了连接所有工具(带有 transacionDetails 的工具和不带有 transacionDetails 的工具),我必须创建两个 tMap(以使两个输出数据集具有完全相同的列),然后将它们与 tUnite 组件连接起来。
有没有更简单直观的方法来达到预期的效果?
您可以尝试另一个 JSON 库 Josson 将 JSON 转换为 csv。
https://github.com/octomix/josson
反序列化
Josson josson = Josson.fromJsonString(
"[" +
" {" +
" \"transactionDetails\": [" +
" {" +
" \"tradeDate\": \"2023-02-13T00:00:00\"," +
" \"price\": \"90\"," +
" \"nominalAmount\": 26000000.0" +
" }," +
" {" +
" \"tradeDate\": \"2023-02-13T00:00:00\"," +
" \"price\": \"95\"," +
" \"nominalAmount\": 1000000.0" +
" }," +
" {" +
" \"tradeDate\": \"2023-02-13T00:00:00\"," +
" \"price\": \"97\"," +
" \"nominalAmount\": 30000000.0" +
" }" +
" ]," +
" \"Description\": \"Apple\"," +
" \"isin\": \"ISIN1\"" +
" }," +
" {" +
" \"transactionDetails\": [" +
" {" +
" \"tradeDate\": \"2023-02-13T00:00:00\"," +
" \"price\": \"88\"," +
" \"nominalAmount\": 27000000.0" +
" }," +
" {" +
" \"tradeDate\": \"2023-02-13T00:00:00\"," +
" \"price\": \"99\"," +
" \"nominalAmount\": 1000000.0" +
" }," +
" {" +
" \"tradeDate\": \"2023-02-13T00:00:00\"," +
" \"price\": \"96\"," +
" \"nominalAmount\": 24000000.0" +
" }" +
" ]," +
" \"Description\": \"Microsoft\"," +
" \"isin\": \"ISIN2\"" +
" }," +
" {" +
" \"Description\": \"Tesla\"," +
" \"isin\": \"ISIN3\"" +
" }" +
"]");
转型
String csv = josson.getString(
"unwind(+transactionDetails)" +
".map(+Description, +isin, +tradeDate, +price, +nominalAmount)@" +
".csv()" +
".@join('\n')");
System.out.print(csv);
输出
Apple,ISIN1,2023-02-13T00:00:00,90,2.6E7
Apple,ISIN1,2023-02-13T00:00:00,95,1000000.0
Apple,ISIN1,2023-02-13T00:00:00,97,3.0E7
Microsoft,ISIN2,2023-02-13T00:00:00,88,2.7E7
Microsoft,ISIN2,2023-02-13T00:00:00,99,1000000.0
Microsoft,ISIN2,2023-02-13T00:00:00,96,2.4E7
Tesla,ISIN3,,,