我有以下 Pig 脚本,想将其翻译成 Spark Scala:
FOREACH (GROUP callMetrics BY (datacenter, instance, tag, host_name, db_name, cluster_name, method)) {
groupedCounts = FOREACH callMetrics GENERATE
timestamp AS timestamp,
sensor_value AS sensor_value,
last_reset_time AS last_reset_time;
GENERATE
group.datacenter AS datacenter,
group.instance AS instance,
group.tag AS tag,
group.host_name AS host_name,
group.db_name AS db_name,
group.cluster_name AS cluster_name,
group.method AS method,
FLATTEN(udf.compute_qps(groupedCounts)) AS (timestamp, qps);
};
我尝试在 Spark 中使用
groupBy
,但如果没有某种聚合,我似乎无法使用它。