如何自动化数据框操作,替换我必须返回该数据的操作版本的任何团队名称输入? R

问题描述 投票:0回答:1

我的目标是使用某种自动化流程,在其中我可以在代码块中的所有事件中输入团队的团队名称。我一直在尝试使用一个函数来做到这一点。我提供了我想要进行的操作的子集、执行相同操作的手动编码,以及用“team”代替示例团队名称“falcons”的函数

整理后的手动版本:

library(dplyr)

falconscap2022 <- falconscap2022 |>
  mutate(cash = sum(average),
         nump = n(), 
         .by = posg) |>
  mutate(cap_pct = percent(cash/sum(cash)))

falconspicks2022 <- mutate(falconspicks2022, top100 = as.numeric(pick) <= 100, .keep = "unused")

falcons2022stuff <- falconscap2022 %>%
  left_join(falconspicks2022, by = "posg")

数据:

falconscap2022 <- structure(list(average = c(18333333, 16823333, 9375000, 8227624, 
5500000, 5383617, 5250000, 4850000, 3681822, 3576437), posg = c("OL", 
"Front7", "QB", "TE", "DB", "WR", "RB", "ST", "OL", "DB")), row.names = c(NA, 
-10L), class = "data.frame")

falconspicks2022 <- structure(list(pick = c("10", "109", "200", "94", "109"), posg = c("OL", 
"Front7", "DB", "WR", "DB")), row.names = c(NA, -5L), class = "data.frame")

未整理,功能版本:

datacleanup <- function(team){
  
  
##each team will have their own precleaned data assigned, example of falcons would be falconscap2022
teamcap2022$average <- gsub(",", "",  teamcap2022$average) 
teamcap2022$average <- gsub("\\$", "",  teamcap2022$average)
teamcap2022$average <- as.numeric(teamcap2022$average)


teamcap2022 <- teamcap2022 %>%
  group_by(posg) %>%
  mutate(cash = sum(average),
            nump = n()) 

teamcap2022 <- teamcap2022 %>%
   ungroup() %>%
  mutate(cap_pct = percent(cash/sum(cash)))

##each team will also have their own precleaned picks df, example is falconspicks2022
teampicks2022 <- teampicks2022 %>%
  mutate(
    top100 = case_when(
      pick < 101 ~ 1,
      TRUE ~ 0
    ),
    latepicks = case_when( 
      pick > 100 ~ 1,
      TRUE ~ 0
      )
  ) %>% select(c("top100", "latepicks", "posg"))


team2022stuff <- teamcap2022 %>%
  left_join(teampicks2022, by = "posg")
  
  
}

我希望能够执行类似 datacleanup(falcons) 的操作来获得与手动代码相同的输出

r function dplyr data-manipulation data-cleaning
1个回答
0
投票
datacleanup2 <- function(df, df2) {
  df |>
    mutate(average = stringr::str_remove_all(average, ",|\\$") |>
             as.numeric()) |>
    mutate(cash = sum(average),
           nump = n(), .by= posg) |>
    mutate(cap_pct = scales::percent(cash/sum(cash))) |>
    left_join(df2 |>
                mutate(top100 = as.numeric(pick < 101),
                       latepicks = as.numeric(pick > 100)) |>
                select(posg, top100, latepicks))
}

datacleanup2(falconscap2022, falconspicks2022)

结果

Joining with `by = join_by(posg)`
    average   posg     cash nump cap_pct top100 latepicks
1  18333333     OL 22015155    2  19.64%      1         0
2  16823333 Front7 16823333    1  15.01%      0         1
3   9375000     QB  9375000    1   8.36%     NA        NA
4   8227624     TE  8227624    1   7.34%     NA        NA
5   5500000     DB  9076437    2   8.10%      0         1
6   5500000     DB  9076437    2   8.10%      0         1
7   5383617     WR  5383617    1   4.80%      0         1
8   5250000     RB  5250000    1   4.68%     NA        NA
9   4850000     ST  4850000    1   4.33%     NA        NA
10  3681822     OL 22015155    2  19.64%      1         0
11  3576437     DB  9076437    2   8.10%      0         1
12  3576437     DB  9076437    2   8.10%      0         1
© www.soinside.com 2019 - 2024. All rights reserved.