每个国家和每年的技术课程加权数

问题描述 投票:0回答:1

我面临这个问题:我正在尝试对每个国家和每年的技术课程进行加权计数。 我正在离开这样的数据框:

library(dplyr)  
df <- tibble(
 id = c("01", "01", "02", "02", "02", "02", "03"), 
 year = c("1975", "1975", "1976", "1976", "1976", "1976", "1980"),
 country = c("US", "CA", "DE", "DE", "FR", "FR", "IT"),
 uspc_class = c("A", "A", "B", "C", "B", "C", "D"),
 fractional_count = c("0.5", "0.5", "0.5", "0.5", "0.5", "0.5", "1"))

其中 id 是与 uspc_class(es) 关联并由一个或多个国家/地区生产的专利的 id。

我想对每个 uspc_class 进行统计,看看每年每个国家有多少。

我可以使用以下代码进行正常计数:

df_count <- df %>%
  group_by(uspc_class, country, year) %>%
  dplyr::summarise(cc_ijt = n()) %>%
  ungroup()

我在 df_count 数据帧的 cc_ijt 变量中得到计数。 然而,由于在某些情况下同一个 ID 有多个国家,我想考虑到这一点以避免重复计算。

也就是说,我的代码得到的结果是这样的数据框:

df_count <- tibble(
  uspc_class = c("A", "A", "B", "B", "C", "C", "D"), 
  country = c("CA", "US", "DE", "FR", "DE",  "FR", "IT"),
  year = c("1975", "1975", "1976", "1976", "1976", "1976", "1980"),
  cc_ijt = c("1", "1", "1", "1", "1", "1", "1"))

我会得到的是这样的:

df_count <- tibble(
  uspc_class = c("A", "A", "B", "B", "C", "C", "D"), 
  country = c("CA", "US", "DE", "FR", "DE",  "FR", "IT"),
  year = c("1975", "1975", "1976", "1976", "1976", "1976", "1980"),
  cc_ijt = c("0.5", "0.5", "0.5", "0.5", "0.5", "0.5", "1"))

cc_ijt 考虑到 uspc_class 的计数必须由 fractional_count 加权。

如何修改我的代码来做到这一点?谢谢!

r dplyr count weighted
1个回答
0
投票

这应该有效。

library(tidyverse)

df <- tibble(
  id = c("01", "01", "02", "02", "02", "02", "03"), 
  year = c("1975", "1975", "1976", "1976", "1976", "1976", "1980"),
  country = c("US", "CA", "DE", "DE", "FR", "FR", "IT"),
  uspc_class = c("A", "A", "B", "C", "B", "C", "D"),
  fractional_count = c("0.5", "0.5", "0.5", "0.5", "0.5", "0.5", "1"))

df %>%
  group_by(uspc_class, country, year) %>%
  mutate(fractional_count= as.numeric(fractional_count)) %>%
  summarise(n= n(),
            weighted_count= n* fractional_count)
#> `summarise()` has grouped output by 'uspc_class', 'country'. You can override
#> using the `.groups` argument.
#> # A tibble: 7 × 5
#> # Groups:   uspc_class, country [7]
#>   uspc_class country year      n weighted_count
#>   <chr>      <chr>   <chr> <int>          <dbl>
#> 1 A          CA      1975      1            0.5
#> 2 A          US      1975      1            0.5
#> 3 B          DE      1976      1            0.5
#> 4 B          FR      1976      1            0.5
#> 5 C          DE      1976      1            0.5
#> 6 C          FR      1976      1            0.5
#> 7 D          IT      1980      1            1

创建于 2023-04-14 与 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.