我想创建各国生产的商品价格的成对平均值。我的数据看起来像这样
df <- data.frame(country = c("US; UK; FI", "CN; IT; US; GR", "UK; US"),
product_id = c(1, 2, 3),
price = c(300, 500, 200))
我想转换数据以创建两个国家之间的平均价格。像这样的东西:
Ctr_1 Ctr_2 Avg_Price
US UK 250
US FI 300
US CN 500
US IT 500
UK FI 300
UK US 250
CN IT 500
CN US 500
CN GR 500
IT CN 500
IT US 500
IT GR 500
GR CN 500
GR IT 500
GR US 500
我尝试将数据更改为长格式。
library(data.table)
setDT(df)
df1 <- df[, .(country = unlist(strsplit(country, "; "))), by = .(product_id)]
但不知道如何从这里继续。任何帮助将非常感激。事实上,还有一个年份变量,其想法是每年成对聚合以创建面板数据集。
这是使用 tidyverse 库的解决方案。重要步骤是:
unnest
将国家/地区字段拆分为列表后创建一个长数据集left_join
创建数据帧行的笛卡尔积library(dplyr)
library(tidyr)
df %>%
mutate(country = strsplit(country, "; +")) %>%
unnest(cols = c(country)) %>%
left_join(., ., by = "product_id", relationship = "many-to-many") %>%
filter(country.x != country.y) %>%
mutate(price = mean(c(price.x, price.y))) %>%
rename(Country1 = country.x,
Country2 = country.y) %>%
select(Country1, Country2, product_id, price)
## + # A tibble: 20 × 4
## Country1 Country2 product_id price
## <chr> <chr> <dbl> <dbl>
## 1 US UK 1 410
## 2 US FI 1 410
## 3 UK US 1 410
## 4 UK FI 1 410
## 5 FI US 1 410
## 6 FI UK 1 410
## 7 CN IT 2 410
## 8 CN US 2 410
## 9 CN GR 2 410
## 10 IT CN 2 410
## 11 IT US 2 410
## 12 IT GR 2 410
## 13 US CN 2 410
## 14 US IT 2 410
## 15 US GR 2 410
## 16 GR CN 2 410
## 17 GR IT 2 410
## 18 GR US 2 410
## 19 UK US 3 410
## 20 US UK 3 410