我有这张最喜欢的食物表(称为“food_table”) - 请注意,没有(汉堡,汉堡)的信息:
food_1 food_2 number_of_people
pizza pizza 3
chocolate pizza 3
tacos pizza 10
burger pizza 2
pizza chocolate 6
chocolate chocolate 5
tacos chocolate 4
burger chocolate 6
pizza tacos 9
chocolate tacos 10
tacos tacos 5
burger tacos 3
pizza burger 9
chocolate burger 9
tacos burger 9
我正在尝试制作一个 4x4 矩阵,显示喜欢每种组合的人的相对百分比:
# step 1: counts (food1 , food2)
pizza chocolate tacos burger
pizza 3 6 9 9
chocolate 3 5 10 9
tacos 10 4 5 9
burger NULL 6 3 3
# step 2 :precents (each row should add to 100)
(food1 , food2)
e.g. (pizza,pizza) = 3/(3+6+9+9), (pizza,chocolate) = 6/(3+6+9+9), (pizza,tacos) = 9/(3+6+9+9), (pizza,burger) = 9/(3+6+9+9)
pizza chocolate tacos burger
pizza 11.11111 22.22222 33.33333 33.33333
chocolate 11.11111 18.51852 37.03704 33.33333
tacos 35.71429 14.28571 17.85714 32.14286
burger NULL 50 25 25
我尝试过这样的:
with step1 as (
select
food_1,
food_2,
number_of_people * 100/ sum(number_of_people) over (partition by food_1) as percent
from food_table
group by
food_1,
food_2),
step2 as(
select food_1 as "food1/food2",
max(case when food_2 = 'pizza' then percent end) as pizza,
max(case when food_2 = 'tacos' then percent end) as tacos,
max(case when food_2 = 'burger' then percent end) as burger,
max(case when food_2 = 'chocolate' then percent end) as chocolate
from step1
group by "food1/food2")
select * from step2;
但我认为这是不正确的 - 代码的结果与我的手算不符。
我该如何解决这个问题?
首先,您显示的表格不正确 - 汉堡和披萨不能为 NULL,因为上面列出了 2。
其次,步骤 1 中的 SQL 使用 OLAP 函数,不需要 GROUP BY。通过.
尝试这个修改后的 SQL
with step1 as (
select
food_1,
food_2,
number_of_people * 100/ sum(number_of_people) over (partition by food_1) as percent
from food
)
, step2 as(
select food_1 as "food1/food2",
max(case when food_2 = 'pizza' then percent end) as pizza,
max(case when food_2 = 'tacos' then percent end) as tacos,
max(case when food_2 = 'burger' then percent end) as burger,
max(case when food_2 = 'chocolate' then percent end) as chocolate
from step1
group by food_1)
select * from step2;
由于四舍五入,它不会加到 100 - 因此您可能需要将数据类型更改为十进制。