在流分析中过滤出重复项

问题描述 投票:0回答:1

我正在通过一些不同的桥从一些传感器接收数据。我收到的数据包含很多重复项。使用相同的serialNo,值,(几乎)相同的日期时间等,但来自不同的网桥。数据不包括某种唯一的eventId,仅包括每个事件唯一的时间戳,即使重复也是如此。因此,我无法对其进行过滤。

这里是一个例子:

{"dsType":"WMBUS","mrfCuId":"B827EBE84EEB","timeStamp":1583750353969,"dateTime":"2020-03-09T10:39:13Z","serialNo":"02001703","manufacturer":"Lansen","modelNo":"LAN_WMBUS_G2_TH","battLvl":0,"bridgeId":"AE8B2FC5","rssi":-25,"hopCnt":1,"latCnt":0,"dpCnt":2,"datapoint":[{"type":"FLOAT","name":"Temperature","size":32,"dataType":"BCD_DIGIT","unit":"C","res":0.1,"resUnit":"Degrees","valueType":"CSV","value":15.8,"scale":1.0,"min":"-20","max":"55","low":" ","high":" "},{"type":"NUMBER","name":"Humidity","size":8,"dataType":"UINT8","unit":"%","res":1.0,"resUnit":"%","valueType":"CSV","value":39,"scale":1.0,"min":" ","max":" ","low":" ","high":" "}],"uniqueId":"LAS02001703","vif":7,"dif":27,"rssiWmbus":-94,"EventProcessedUtcTime":"2020-03-09T11:54:07.5197619Z","PartitionId":0,"EventEnqueuedUtcTime":"2020-03-09T10:39:14.0440000Z"}
{"dsType":"WMBUS","mrfCuId":"B827EBE84EEB","timeStamp":1583750354377,"dateTime":"2020-03-09T10:39:14Z","serialNo":"02001703","manufacturer":"Lansen","modelNo":"LAN_WMBUS_G2_TH","battLvl":0,"bridgeId":"01000000","rssi":-35,"hopCnt":1,"latCnt":0,"dpCnt":2,"datapoint":[{"type":"FLOAT","name":"Temperature","size":32,"dataType":"BCD_DIGIT","unit":"C","res":0.1,"resUnit":"Degrees","valueType":"CSV","value":15.8,"scale":1.0,"min":"-20","max":"55","low":" ","high":" "},{"type":"NUMBER","name":"Humidity","size":8,"dataType":"UINT8","unit":"%","res":1.0,"resUnit":"%","valueType":"CSV","value":39,"scale":1.0,"min":" ","max":" ","low":" ","high":" "}],"uniqueId":"LAS02001703","vif":7,"dif":27,"rssiWmbus":-80,"EventProcessedUtcTime":"2020-03-09T11:54:07.5197619Z","PartitionId":0,"EventEnqueuedUtcTime":"2020-03-09T10:39:14.4190000Z"}

这是在Stream Analytics中过滤掉重复项的某种方法吗?如果有可能的话,这些数据最终也将流向Power BI。但是,在Power Bi中使用“删除重复项”时,您需要一种EventId,该EventId在其他所有事物中都是唯一的,但对于重复数据而言是相同的。

提前感谢!

powerbi powerbi-desktop azure-stream-analytics stream-analytics
1个回答
0
投票

根据您的描述,您只想实现类似于关系数据库功能的distinct功能,以便可以根据某些列过滤某些行。

实际上,在ASA中可能会有一些限制。主要思想是使用COUNT and GROUP BY关键字。

例如,我的测试数据如下:

enter image description here

SQL:

SELECT COUNT(DISTINCT b.timestamp),b.dsType,b.mrfCuId来自blobstreamb GROUP BY b.dsType,b.mrfCuId,TumblingWindow(minute,5)

输出:

enter image description here

我从此official example中得到了一些线索。

© www.soinside.com 2019 - 2024. All rights reserved.