我正在使用 BEA 的一些公开就业数据(来自此页面的 CAINC4:https://apps.bea.gov/regional/downloadzip.cfm)。我重新调整了数据来计算就业率,但变量名称非常混乱。我尝试编写一个函数来根据键:值对重命名它们,但没有成功。
library(ggplot2)
library(usmap)
library(readr)
library(dplyr)
library(stringr)
library(reshape2)
library(tidyr)
library(shiny)
library(cowplot)
library(plotly)
inc_rr = structure(list(fips = c("01001", "01003", "01005", "01007", "01009",
"01011"), geoname = c("Autauga, AL", "Baldwin, AL", "Barbour, AL",
"Bibb, AL", "Blount, AL", "Bullock, AL"), region = c(5L, 5L,
5L, 5L, 5L, 5L), tablename = c("CAINC4", "CAINC4", "CAINC4",
"CAINC4", "CAINC4", "CAINC4"), x1969_10 = c(69973, 157191, 51130,
29810, 69322, 22716), x1969_20 = c(25166, 56951, 23818, 14994,
26411, 12554), x1969_7010 = c(6630, 20008, 9428, 4087, 7660,
4370), x1969_7020 = c(5457, 14886, 7149, 3344, 4839, 3431), x1970_10 = c(77712,
172745, 57657, 32551, 70070, 25439), x1970_20 = c(24606, 59474,
22655, 13798, 26999, 11749), x1970_7010 = c(6853, 19749, 9448,
3965, 7587, 4281), x1970_7020 = c(5650, 14713, 7323, 3259, 4891,
3360)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
## Function for renaming data
# Write linecode dictionary
codes <- list(
inc = 10,
pop = 20,
emp = 7010,
wemp = 7020
)
codes <- list(
"10" = "inc",
"20" = "pop",
"7010" = "emp",
"7020" = "wemp"
)
rename_vars <- function(var_name) {
var_name <- sub("^x", "", var_name) # Remove the "x" prefix
year <- substr(var_name, 1, 4) # Extract the year from the variable name
type <- substr(var_name, start = 6, stop = nchar(var_name)) # extract the line code
我不确定要使用哪个代码列表,因此有两个。据我所知,上述块的末尾没有错误。我想写一些类似的东西
rename_vars <- function(var_name) {
var_name <- sub("^x", "", var_name) # Remove the "x" prefix
year <- substr(var_name, 1, 4) # Extract the year from the variable name
type <- substr(var_name, start = 6, stop = nchar(var_name)) # extract the line code
if (type %in% names(codes)) {
new_name = paste0(type, year, sep = "_")
return(new_name)
} else {
return(var_name)
}
test <- inc_rr %>%
rename_with(rename_vars, starts_with("x"))
但这对我不起作用——变量名称没有任何变化。
我想应用此函数的重塑数据中的变量名称示例:
x1969_10、x1969_7020、x1980_20 等。年份包括 1969-2022,我仅使用代码列表中列出的线路代码。基本思想是让函数删除“x”前缀,然后存储变量的年份,将代码与其密钥对匹配,并将每个变量重命名为 key_year。
您的函数有一些问题:
(type %in% names(codes))
这会失败,因为返回大小为 length(codes)
new_name = paste0(type, year, sep = "_")
代码只是再次粘贴旧名称,除了使用pasto0而不是paste试试这个:
rename_vars <- function(var_name) {
var_name <- sub("^x", "", var_name) # Remove the "x" prefix
year <- substr(var_name, 1, 4) # Extract the year from the variable name
type <- substr(var_name, start = 6, stop = nchar(var_name)) # extract the line code
# use any to reduce to boolean lenght one
if (any(type %in% names(codes))) {
# get mathcing code from code list
type_name <- codes[[which(names(codes) == type)]]
# paste it
new_name <- paste(type_name, year, sep = "_")
return(new_name)
} else {
return(var_name)
}
}
然后矢量化:
vrename_vars <- Vectorize(rename_vars)
并测试:
inc_rr %>%
rename_with(vrename_vars, starts_with("x")) %>% colnames()
[1] "fips" "geoname" "region" "tablename"
[5] "inc_1969" "pop_1969" "emp_1969" "wemp_1969"
[9] "inc_1970" "pop_1970" "emp_1970" "wemp_1970"