从 R 中的键:值对重命名变量

问题描述 投票:0回答:1

我正在使用 BEA 的一些公开就业数据(来自此页面的 CAINC4:https://apps.bea.gov/regional/downloadzip.cfm)。我重新调整了数据来计算就业率,但变量名称非常混乱。我尝试编写一个函数来根据键:值对重命名它们,但没有成功。

library(ggplot2)
library(usmap)
library(readr)
library(dplyr)
library(stringr)
library(reshape2)
library(tidyr)
library(shiny)
library(cowplot)
library(plotly)


 inc_rr = structure(list(fips = c("01001", "01003", "01005", "01007", "01009", 
"01011"), geoname = c("Autauga, AL", "Baldwin, AL", "Barbour, AL", 
"Bibb, AL", "Blount, AL", "Bullock, AL"), region = c(5L, 5L, 
5L, 5L, 5L, 5L), tablename = c("CAINC4", "CAINC4", "CAINC4", 
"CAINC4", "CAINC4", "CAINC4"), x1969_10 = c(69973, 157191, 51130, 
29810, 69322, 22716), x1969_20 = c(25166, 56951, 23818, 14994, 
26411, 12554), x1969_7010 = c(6630, 20008, 9428, 4087, 7660, 
4370), x1969_7020 = c(5457, 14886, 7149, 3344, 4839, 3431), x1970_10 = c(77712, 
172745, 57657, 32551, 70070, 25439), x1970_20 = c(24606, 59474, 
22655, 13798, 26999, 11749), x1970_7010 = c(6853, 19749, 9448, 
3965, 7587, 4281), x1970_7020 = c(5650, 14713, 7323, 3259, 4891, 
3360)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

## Function for renaming data

# Write linecode dictionary
codes <- list(
  inc = 10,
  pop = 20,
  emp = 7010,
  wemp = 7020
)

codes <- list(
  "10" = "inc",
  "20" = "pop",
  "7010" = "emp",
  "7020" = "wemp"
)

rename_vars <- function(var_name) {
  var_name <- sub("^x", "", var_name)  # Remove the "x" prefix
  year <- substr(var_name, 1, 4)  # Extract the year from the variable name
  type <- substr(var_name, start = 6, stop = nchar(var_name)) # extract the line code

我不确定要使用哪个代码列表,因此有两个。据我所知,上述块的末尾没有错误。我想写一些类似的东西


rename_vars <- function(var_name) {
  var_name <- sub("^x", "", var_name)  # Remove the "x" prefix
  year <- substr(var_name, 1, 4)  # Extract the year from the variable name
  type <- substr(var_name, start = 6, stop = nchar(var_name)) # extract the line code
  if (type %in% names(codes)) {
  new_name = paste0(type, year, sep = "_")
return(new_name)
} else {
return(var_name) 
}

test <- inc_rr %>%
  rename_with(rename_vars, starts_with("x"))

但这对我不起作用——变量名称没有任何变化。

我想应用此函数的重塑数据中的变量名称示例:

x1969_10、x1969_7020、x1980_20 等。年份包括 1969-2022,我仅使用代码列表中列出的线路代码。基本思想是让函数删除“x”前缀,然后存储变量的年份,将代码与其密钥对匹配,并将每个变量重命名为 key_year。

r function pivot rename
1个回答
0
投票

您的函数有一些问题:

  1. (type %in% names(codes))
    这会失败,因为返回大小为
    length(codes)
  2. 的逻辑向量
  3. 这里
    new_name = paste0(type, year, sep = "_")
    代码只是再次粘贴旧名称,除了使用pasto0而不是paste
  4. 函数需要按原样进行矢量化。

试试这个:

rename_vars <- function(var_name) {
  var_name <- sub("^x", "", var_name)  # Remove the "x" prefix
  year <- substr(var_name, 1, 4)  # Extract the year from the variable name
  type <- substr(var_name, start = 6, stop = nchar(var_name)) # extract the line code
  # use any to reduce to boolean lenght one
  if (any(type %in% names(codes))) {
    # get mathcing code from code list
    type_name <-  codes[[which(names(codes) == type)]]
    # paste it
    new_name <-  paste(type_name, year, sep = "_")
    return(new_name)
  } else {
    return(var_name) 
  }
}

然后矢量化:

vrename_vars <- Vectorize(rename_vars)

并测试:

     inc_rr %>%
     rename_with(vrename_vars, starts_with("x")) %>% colnames()
 [1] "fips"      "geoname"   "region"    "tablename"
 [5] "inc_1969"  "pop_1969"  "emp_1969"  "wemp_1969"
 [9] "inc_1970"  "pop_1970"  "emp_1970"  "wemp_1970"
© www.soinside.com 2019 - 2024. All rights reserved.