我面临这个问题:我正在尝试对每个国家和每年的技术课程进行加权计数。 我正在离开这样的数据框:
library(dplyr)
df <- tibble(
id = c("01", "01", "02", "02", "02", "02", "03"),
year = c("1975", "1975", "1976", "1976", "1976", "1976", "1980"),
country = c("US", "CA", "DE", "DE", "FR", "FR", "IT"),
uspc_class = c("A", "A", "B", "C", "B", "C", "D"),
fractional_count = c("0.5", "0.5", "0.5", "0.5", "0.5", "0.5", "1"))
其中 id 是与 uspc_class(es) 关联并由一个或多个国家/地区生产的专利的 id。
我想对每个 uspc_class 进行统计,看看每年每个国家有多少。
我可以使用以下代码进行正常计数:
df_count <- df %>%
group_by(uspc_class, country, year) %>%
dplyr::summarise(cc_ijt = n()) %>%
ungroup()
我在 df_count 数据帧的 cc_ijt 变量中得到计数。 然而,由于在某些情况下同一个 ID 有多个国家,我想考虑到这一点以避免重复计算。
也就是说,我的代码得到的结果是这样的数据框:
df_count <- tibble(
uspc_class = c("A", "A", "B", "B", "C", "C", "D"),
country = c("CA", "US", "DE", "FR", "DE", "FR", "IT"),
year = c("1975", "1975", "1976", "1976", "1976", "1976", "1980"),
cc_ijt = c("1", "1", "1", "1", "1", "1", "1"))
我会得到的是这样的:
df_count <- tibble(
uspc_class = c("A", "A", "B", "B", "C", "C", "D"),
country = c("CA", "US", "DE", "FR", "DE", "FR", "IT"),
year = c("1975", "1975", "1976", "1976", "1976", "1976", "1980"),
cc_ijt = c("0.5", "0.5", "0.5", "0.5", "0.5", "0.5", "1"))
cc_ijt 考虑到 uspc_class 的计数必须由 fractional_count 加权。
如何修改我的代码来做到这一点?谢谢!
这应该有效。
library(tidyverse)
df <- tibble(
id = c("01", "01", "02", "02", "02", "02", "03"),
year = c("1975", "1975", "1976", "1976", "1976", "1976", "1980"),
country = c("US", "CA", "DE", "DE", "FR", "FR", "IT"),
uspc_class = c("A", "A", "B", "C", "B", "C", "D"),
fractional_count = c("0.5", "0.5", "0.5", "0.5", "0.5", "0.5", "1"))
df %>%
group_by(uspc_class, country, year) %>%
mutate(fractional_count= as.numeric(fractional_count)) %>%
summarise(n= n(),
weighted_count= n* fractional_count)
#> `summarise()` has grouped output by 'uspc_class', 'country'. You can override
#> using the `.groups` argument.
#> # A tibble: 7 × 5
#> # Groups: uspc_class, country [7]
#> uspc_class country year n weighted_count
#> <chr> <chr> <chr> <int> <dbl>
#> 1 A CA 1975 1 0.5
#> 2 A US 1975 1 0.5
#> 3 B DE 1976 1 0.5
#> 4 B FR 1976 1 0.5
#> 5 C DE 1976 1 0.5
#> 6 C FR 1976 1 0.5
#> 7 D IT 1980 1 1
创建于 2023-04-14 与 reprex v2.0.2