data.table长数据格式的逐行计算

问题描述 投票:0回答:1

嗨 Stackover 流程社区

我是 R 学习新手,感谢您的耐心等待。我正在学习 data.table 并想使用 data.table 计算进行逐行计算。请参阅附有 6 列的图像(该图像仅显示一小部分数据):enter image description here

我想计算失业率,即“失业总数/劳动力总数”。由于数据采用长格式,失业总数和劳动力总数都在“变量”列内。我想要按“性别”、“地区”和“数据类型”计算的失业率。

我的问题是:如何对失业率进行这样的计算(使用 data.table 而不是 dplyr),因为我理想地希望得到行中的失业率结果。

例如

enter image description here

谢谢,我还附上了可以帮助您生成原始数据集的代码。

rm(list=ls())

#Bring in all installed packages
library(readabs)

library(tidyverse)

library(dplyr)

library(stringr)

library(lubridate)

library(data.table)

library(tidyr)

library(fy)

library(psych)

library(plyr)

#**************************************************************************************************************

#Labour force indicators

abs_labour_force_base <- read_abs(cat_no = '6202.0', 
                                  tables = 12,
                                  series_id = NULL,
                                  metadata = TRUE,
                                  show_progress_bars = FALSE,
                                  retain_files = FALSE,
                                  check_local = FALSE
)

#split out series column

abs_labour_force_base <- separate_series(abs_labour_force_base)

setDT(abs_labour_force_base)

# create raw data set

d <- 
  lf_monthly_qld  <- abs_labour_force_base[
      series_1 %in% c("Unemployed total","Labour force total") & 
      series_2 %in% c("Males","Female") &
      series_3 %in% c("Australia","Victoria") &
      series_type %in% c("Seasonally Adjusted", "Trend") &
      date > as_date("2023-01-08"),]


keep_cols = c("date", "series_1", "series_2","series_3","value","series_type")

d <- d[, ..keep_cols]


colnames(d)<-c("date","variable","sex","region","value","data_type")

我没有尝试太多,因为我不确定是否是进行此操作的最佳方法。

r data.table
1个回答
0
投票

使用

dcast

d <- 
  abs_labour_force_base[
    series_1 %in% c("Unemployed total","Labour force total") & 
      series_2 %in% c("Males","Females") &
      series_3 %in% c("Australia","Victoria") &
      series_type %in% c("Seasonally Adjusted", "Trend") &
      date > "2023-01-08"]

Ans <- dcast(d, date + series_type + series_2 + series_3 ~ series_1, value.var = "value")
Ans[, "UnemployedRate" := `Unemployed total` / `Labour force total`]

注意您原来问题中的小错误

series_2 %in% c("Males", "Females")
(它是
"Female"
)。您可以使用 hutils 包和
%ein%
来避免此类错误。

library(hutils)
abs_labour_force_base[series_2 %ein% c("Males", "Female")]
#> Error: `rhs` contained Female, but this value was not found in `lhs = series_2`. All values of `rhs` must be in `lhs`. Ensure you have specified `rhs` correctly.
© www.soinside.com 2019 - 2024. All rights reserved.