通过其他列值获取列中每个唯一值的前x％行

Question

表格“标签”：

Source  Target      Weight
#003    blitzkrank  0.83
#003    deutsch     0.7
#003    brammen     0.57
#003    butzfrauen  0.55
#003    solaaaa     0.5
#003    moments     0.3
college scandal     1.15
college prosecutors 0.82
college students    0.41
college usc         0.33
college full house  0.17
college friends     0.08
college house       0.5
college friend      0.01

该表在“源”列中有560万行和〜91.000个唯一条目。

对于“源”和“目标”中的每个唯一值，我需要按权重（表按“源”（升序）排序，然后是前x％行（例如，前20％，前30％，需要可变）。 “重量”（下降）。

如果行的“重量”相同，则按字母顺序排列行。
如果x％== 0，则至少占据一行。

由于存在重复项（例如，“ Source =” college“将产生至少一个重复行，因为” Target“ =” scandal“，因此应尽可能删除重复项。否则就没什么大不了了。

“源”的计算：

6 rows where Source = "#003", 6 * 0.2 = 1.2 = take 1 row
8 rows where Source = "college", 8 * 0.2 = 1.6 = take 2 rows

“源”所需的结果表：

Source  Target      Weight
#003    blitzkrank  0.83
college scandal     1.15
college prosecutors 0.82

如何在SQLite数据库的SQL中做到这一点？

Answer 1

如果要通过source进行采样：

select t.*
from (select t.*,
             row_number() over (partition by source order by weight desc, target) as seqnum,
             count(*) over (partition by source) as cnt
      from t
     ) t
where seqnum = 1 or  -- always at least one row
      seqnum <= round(cnt * 0.2);

根据您的示例，我认为这就是您想要的。您可以为target构建类似的查询。

通过其他列值获取列中每个唯一值的前x％行

问题描述投票：1回答：1

1个回答

最新问题

通过其他列值获取列中每个唯一值的前x％行

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1