PostgreSQL - 按不同的值对分组

Question

我有一个类似 netflow 表的东西，想以这样的方式对其进行分组：按 (src_ip, src_port, dst_ip, dst_port) 进行分组，其中值可以在 src 和 dst 字段之间交换。

src_ip	源端口	dst_ip	目标端口	已发送字节数
192.168.1.1	123	192.168.10.5	321	111
192.168.10.5	321	192.168.1.1	123	222
10.0.0.5	50	172.0.0.5	55	500
172.0.0.5	55	10.0.0.5	50	300
192.168.1.1	123	192.168.10.5	321	1000
192.168.1.1	123	192.168.10.5	20	999

我希望从此表中得到以下结果：

src_ip	源端口	dst_ip	目标端口	已发送字节数	字节_recv
192.168.1.1	123	192.168.10.5	321	1111	222
10.0.0.5	50	172.0.0.5	55	500	300
192.168.1.1	123	192.168.10.5	20	999	0

基本上，尝试捕获单行中双向的流量。因此，像按 (src_ip, src_port) 和 (dst_ip, dst_port) 分组一样，这些值可以颠倒。实现这一目标的最佳方法是什么？

Answer 1

为了决定哪个 IP、端口和方向，您必须制定一条规则，在聚合结果中考虑谁是发送者以及谁是接收者。让我们将较小的 IP 作为源，将较大的 IP 作为目标。然后，一遍又一遍地使用相同的

CASE

表达式来决定将哪个原始列放入哪个结果列中。完成后，汇总您的数据。

with 
  data as
  (
    select
      case when src_ip < dst_ip then  src_ip      else  dst_ip      end as source_ip,
      case when src_ip < dst_ip then  dst_ip      else  src_ip      end as dest_ip,
      case when src_ip < dst_ip then  src_port    else  dst_port    end as source_port,
      case when src_ip < dst_ip then  dst_port    else  src_port    end as dest_port,
      case when src_ip < dst_ip then  bytes_sent  else  0           end as sent,
      case when src_ip < dst_ip then  0           else  bytes_sent  end as received
    from mytable
  )
select
  source_ip, source_port, dest_ip, dest_port,
  sum(sent) as bytes_sent,
  sum(received) as bytes_received
from data
group by source_ip, source_port, dest_ip, dest_port
order by source_ip, source_port, dest_ip, dest_port;

Answer 2

您可以使用自我

join

：

with cte as (
   select row_number() over (order by greatest(n.src_ip, n.dst_ip)) r, n.src_ip, n.src_port, n.dst_ip, n.dst_port, min(n.bytes_sent) bytes_sent from netflow n
   group by n.src_ip, n.src_port, n.dst_ip, n.dst_port
)
select n.src_ip, n.src_port, n.dst_ip, n.dst_port, n.bytes_sent, 
   coalesce(n1.bytes_sent, 0) bytes_recieved
from cte n left join cte n1 on n1.src_port = n.dst_port
where not exists (select 1 from cte n2 where n2.r < n.r and n2.dst_port = n.src_port)

看小提琴

Answer 3

假设最小的IP为源IP，最大的为目的IP。

您可以使用

LEAST

和

GREATEST

功能来确保对于最小和最大 IP 地址的每种组合，都会选择一个条目 :

with cte as (
  select least(src_ip, dst_ip) as smallestIP, greatest(src_ip, dst_ip) as largestIP
  from mytable src
  group by least(src_ip, dst_ip), greatest(src_ip, dst_ip)
),
routes as (
  select distinct src_ip, src_port, dst_ip, dst_port 
  from (
    select src_ip, src_port, dst_ip, dst_port 
    from mytable t
    inner join cte c on t.src_ip = c.smallestIP
    union all
    select dst_ip as src_ip, dst_port as src_port, src_ip as dst_ip, src_port as dst_port
    from mytable t
    inner join cte c on t.dst_ip = c.smallestIP
  ) as s
)
select r.src_ip, r.src_port, r.dst_ip, r.dst_port,
       sum(case when r.src_ip = t.src_ip and r.src_port = t.src_port
                     and r.dst_ip = t.dst_ip and r.dst_port = t.dst_port
                then bytes_sent else 0 end ) as bytes_sent,
       sum(case when r.src_ip = t.dst_ip and r.src_port = t.dst_port
                     and r.dst_ip = t.src_ip and r.dst_port = t.src_port
                then bytes_sent else 0 end ) as bytes_recv
from routes r
inner join mytable t on (
                        r.src_ip = t.src_ip and r.src_port = t.src_port
                        and r.dst_ip = t.dst_ip and r.dst_port = t.dst_port)
                      or (
                        r.src_ip = t.dst_ip and r.src_port = t.dst_port
                        and r.dst_ip = t.src_ip and r.dst_port = t.src_port
                      )
group by r.src_ip, r.src_port, r.dst_ip, r.dst_port

演示在这里

Answer 4

您可以通过使用以下语句 GROUP BY、CASE 以及 SUM 函数的组合来聚合您的函数来实现您想要的输出

可按如下方式进行以下查询：

SELECT
    CASE WHEN src_ip < dst_ip THEN src_ip ELSE dst_ip END AS src_ip,
    CASE WHEN src_ip < dst_ip THEN src_port ELSE dst_port END AS src_port,
    CASE WHEN src_ip < dst_ip THEN dst_ip ELSE src_ip END AS dst_ip,
    CASE WHEN src_ip < dst_ip THEN dst_port ELSE src_port END AS dst_port,
    SUM(CASE WHEN src_ip < dst_ip THEN bytes_sent ELSE 0 END) AS bytes_sent,
    SUM(CASE WHEN src_ip < dst_ip THEN 0 ELSE bytes_sent END) AS bytes_recv
FROM your_table
GROUP BY
    CASE WHEN src_ip < dst_ip THEN src_ip ELSE dst_ip END,
    CASE WHEN src_ip < dst_ip THEN src_port ELSE dst_port END,
    CASE WHEN src_ip < dst_ip THEN dst_ip ELSE src_ip END,
    CASE WHEN src_ip < dst_ip THEN dst_port ELSE src_port END;

上面的 CASE 语句会根据 src 和 dst 值的词法顺序确定其顺序，以确保两个方向的分组一致。而 SUM 和 CASE 语句用于聚合不同方向（前向和前向）的 bytes_sent 值分别反向）。

Answer 5

您可以将其减少为

least()

、

greatest()

和聚合

filter

。演示@db<>小提琴：

select least(   src_ip,   dst_ip  ) AS src_ip,
       least(   src_port, dst_port) AS src_port,
       greatest(src_ip,   dst_ip  ) AS dst_ip,
       greatest(src_port, dst_port) AS dst_port,
       coalesce( sum(bytes_sent)filter(where src_ip=least(src_ip,dst_ip))
                ,0) AS bytes_sent,
       coalesce( sum(bytes_sent)filter(where dst_ip=least(src_ip,dst_ip))
                ,0) AS bytes_recv
from netflow group by 1,2,3,4;

src_ip	源端口	dst_ip	目标端口	已发送字节数	字节_recv
10.0.0.5	50	172.0.0.5	55	500	300
192.168.1.1	20	192.168.10.5	123	999	0
192.168.1.1	123	192.168.10.5	321	1111	222

如果您更喜欢

null

在没有交通的地方，您可以放弃

coalesce()

，将其替换为

。

PostgreSQL - 按不同的值对分组

问题描述投票：0回答：5

5个回答

最新问题

PostgreSQL - 按不同的值对分组

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5