我有一个类似 netflow 表的东西,想以这样的方式对其进行分组:按 (src_ip, src_port, dst_ip, dst_port) 进行分组,其中值可以在 src 和 dst 字段之间交换。
src_ip | 源端口 | dst_ip | 目标端口 | 已发送字节数 |
---|---|---|---|---|
192.168.1.1 | 123 | 192.168.10.5 | 321 | 111 |
192.168.10.5 | 321 | 192.168.1.1 | 123 | 222 |
10.0.0.5 | 50 | 172.0.0.5 | 55 | 500 |
172.0.0.5 | 55 | 10.0.0.5 | 50 | 300 |
192.168.1.1 | 123 | 192.168.10.5 | 321 | 1000 |
192.168.1.1 | 123 | 192.168.10.5 | 20 | 999 |
我希望从此表中得到以下结果:
src_ip | 源端口 | dst_ip | 目标端口 | 已发送字节数 | 字节_recv |
---|---|---|---|---|---|
192.168.1.1 | 123 | 192.168.10.5 | 321 | 1111 | 222 |
10.0.0.5 | 50 | 172.0.0.5 | 55 | 500 | 300 |
192.168.1.1 | 123 | 192.168.10.5 | 20 | 999 | 0 |
基本上,尝试捕获单行中双向的流量。因此,像按 (src_ip, src_port) 和 (dst_ip, dst_port) 分组一样,这些值可以颠倒。实现这一目标的最佳方法是什么?
为了决定哪个 IP、端口和方向,您必须制定一条规则,在聚合结果中考虑谁是发送者以及谁是接收者。让我们将较小的 IP 作为源,将较大的 IP 作为目标。然后,一遍又一遍地使用相同的
CASE
表达式来决定将哪个原始列放入哪个结果列中。完成后,汇总您的数据。
with
data as
(
select
case when src_ip < dst_ip then src_ip else dst_ip end as source_ip,
case when src_ip < dst_ip then dst_ip else src_ip end as dest_ip,
case when src_ip < dst_ip then src_port else dst_port end as source_port,
case when src_ip < dst_ip then dst_port else src_port end as dest_port,
case when src_ip < dst_ip then bytes_sent else 0 end as sent,
case when src_ip < dst_ip then 0 else bytes_sent end as received
from mytable
)
select
source_ip, source_port, dest_ip, dest_port,
sum(sent) as bytes_sent,
sum(received) as bytes_received
from data
group by source_ip, source_port, dest_ip, dest_port
order by source_ip, source_port, dest_ip, dest_port;
您可以使用自我
join
:
with cte as (
select row_number() over (order by greatest(n.src_ip, n.dst_ip)) r, n.src_ip, n.src_port, n.dst_ip, n.dst_port, min(n.bytes_sent) bytes_sent from netflow n
group by n.src_ip, n.src_port, n.dst_ip, n.dst_port
)
select n.src_ip, n.src_port, n.dst_ip, n.dst_port, n.bytes_sent,
coalesce(n1.bytes_sent, 0) bytes_recieved
from cte n left join cte n1 on n1.src_port = n.dst_port
where not exists (select 1 from cte n2 where n2.r < n.r and n2.dst_port = n.src_port)
假设最小的IP为源IP,最大的为目的IP。
您可以使用
LEAST
和 GREATEST
功能来确保对于最小和最大 IP 地址的每种组合,都会选择一个条目 :
with cte as (
select least(src_ip, dst_ip) as smallestIP, greatest(src_ip, dst_ip) as largestIP
from mytable src
group by least(src_ip, dst_ip), greatest(src_ip, dst_ip)
),
routes as (
select distinct src_ip, src_port, dst_ip, dst_port
from (
select src_ip, src_port, dst_ip, dst_port
from mytable t
inner join cte c on t.src_ip = c.smallestIP
union all
select dst_ip as src_ip, dst_port as src_port, src_ip as dst_ip, src_port as dst_port
from mytable t
inner join cte c on t.dst_ip = c.smallestIP
) as s
)
select r.src_ip, r.src_port, r.dst_ip, r.dst_port,
sum(case when r.src_ip = t.src_ip and r.src_port = t.src_port
and r.dst_ip = t.dst_ip and r.dst_port = t.dst_port
then bytes_sent else 0 end ) as bytes_sent,
sum(case when r.src_ip = t.dst_ip and r.src_port = t.dst_port
and r.dst_ip = t.src_ip and r.dst_port = t.src_port
then bytes_sent else 0 end ) as bytes_recv
from routes r
inner join mytable t on (
r.src_ip = t.src_ip and r.src_port = t.src_port
and r.dst_ip = t.dst_ip and r.dst_port = t.dst_port)
or (
r.src_ip = t.dst_ip and r.src_port = t.dst_port
and r.dst_ip = t.src_ip and r.dst_port = t.src_port
)
group by r.src_ip, r.src_port, r.dst_ip, r.dst_port
您可以通过使用以下语句 GROUP BY、CASE 以及 SUM 函数的组合来聚合您的函数来实现您想要的输出
可按如下方式进行以下查询:
SELECT
CASE WHEN src_ip < dst_ip THEN src_ip ELSE dst_ip END AS src_ip,
CASE WHEN src_ip < dst_ip THEN src_port ELSE dst_port END AS src_port,
CASE WHEN src_ip < dst_ip THEN dst_ip ELSE src_ip END AS dst_ip,
CASE WHEN src_ip < dst_ip THEN dst_port ELSE src_port END AS dst_port,
SUM(CASE WHEN src_ip < dst_ip THEN bytes_sent ELSE 0 END) AS bytes_sent,
SUM(CASE WHEN src_ip < dst_ip THEN 0 ELSE bytes_sent END) AS bytes_recv
FROM your_table
GROUP BY
CASE WHEN src_ip < dst_ip THEN src_ip ELSE dst_ip END,
CASE WHEN src_ip < dst_ip THEN src_port ELSE dst_port END,
CASE WHEN src_ip < dst_ip THEN dst_ip ELSE src_ip END,
CASE WHEN src_ip < dst_ip THEN dst_port ELSE src_port END;
上面的 CASE 语句会根据 src 和 dst 值的词法顺序确定其顺序,以确保两个方向的分组一致。而 SUM 和 CASE 语句用于聚合不同方向(前向和前向)的 bytes_sent 值分别反向)。
least()
、greatest()
和 聚合 filter
。 演示@db<>小提琴:
select least( src_ip, dst_ip ) AS src_ip,
least( src_port, dst_port) AS src_port,
greatest(src_ip, dst_ip ) AS dst_ip,
greatest(src_port, dst_port) AS dst_port,
coalesce( sum(bytes_sent)filter(where src_ip=least(src_ip,dst_ip))
,0) AS bytes_sent,
coalesce( sum(bytes_sent)filter(where dst_ip=least(src_ip,dst_ip))
,0) AS bytes_recv
from netflow group by 1,2,3,4;
src_ip | 源端口 | dst_ip | 目标端口 | 已发送字节数 | 字节_recv |
---|---|---|---|---|---|
10.0.0.5 | 50 | 172.0.0.5 | 55 | 500 | 300 |
192.168.1.1 | 20 | 192.168.10.5 | 123 | 999 | 0 |
192.168.1.1 | 123 | 192.168.10.5 | 321 | 1111 | 222 |
如果您更喜欢
null
在没有交通的地方,您可以放弃 coalesce()
,将其替换为 0
。