我正在尝试以长格式对我的 data.table 进行子集化,命名为
tmp
:
tmp <- structure(list(Year = c(1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021), variable = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L), .Label = c("FORT ERIE", "GRIMSBY",
"LINCOLN", "NIAGARA FALLS", "NIAGARA-ON-THE-LAKE", "PELHAM",
"PORT COLBORNE", "ST. CATHARINES", "THOROLD", "WAINFLEET", "WELLAND",
"WEST LINCOLN"), class = "factor"), value = c(23113L, 24030L,
24096L, 23253L, 26006L, 27183L, 28143L, 29925L, 29960L, 30710L,
32901L, 15770L, 15565L, 15797L, 16956L, 18520L, 19585L, 21297L,
23937L, 25325L, 27314L, 28883L, 14247L, 14460L, 14196L, 14391L,
17149L, 18801L, 20612L, 21722L, 22487L, 23787L, 25719L, 67163L,
69420L, 70960L, 72107L, 75399L, 76917L, 78815L, 82184L, 82997L,
88071L, 94415L, 12552L, 12485L, 12186L, 12494L, 12945L, 13238L,
13839L, 14587L, 15400L, 17511L, 19090L, 9997L, 10070L, 11104L,
12137L, 13328L, 14343L, 15272L, 16155L, 16598L, 17110L, 18192L,
21420L, 20535L, 19225L, 18281L, 18766L, 18451L, 18450L, 18599L,
18424L, 18306L, 20033L, 109722L, 123350L, 124018L, 123455L, 129300L,
130926L, 129170L, 131989L, 131400L, 133113L, 136803L, 15065L,
14945L, 15412L, 16131L, 17542L, 17883L, 18048L, 18224L, 17931L,
18801L, 23816L, 5486L, 6065L, 6000L, 5955L, 6203L, 6253L, 6258L,
6601L, 6356L, 6372L, 6887L, 44397L, 45050L, 45448L, 45054L, 47914L,
48411L, 48402L, 50331L, 50631L, 52293L, 55750L, 8396L, 9460L,
9846L, 9918L, 10864L, 11513L, 12268L, 13167L, 13837L, 14500L,
15454L)), row.names = c(NA, -132L), class = c("data.table", "data.frame"
)
这个数据表包含 132 行和三列:年份、变量和值。 Year 是收集数据的年份。变量是位置的名称。值是总人口。每个位置有 11 行。
我希望创建一个子集来排除 NIAGARA FALLS, ST 的值。凯瑟琳斯和韦兰。 这有效:
tmpsmall9 <- subset(tmp, variable != 'NIAGARA FALLS') # subset excludes NF
tmpsmall9 <- subset(tmpsmall9, variable != 'ST. CATHARINES') # subset excludes ST. KITTY
tmpsmall9 <- subset(tmpsmall9, variable != "WELLAND") # subset excludes Welland
View(tmpsmall9)
此迭代解决方案消除了与
variable
(即尼亚加拉瀑布、圣凯瑟琳斯、韦兰)中的这三个因素相关的 33 个观测值(即行),留下 99 行。但是,我认为必须有一个更有效的解决方案。所以,我试过这个:
tmpsmall9 <- subset(tmp, variable != c('NIAGARA FALLS','ST. CATHARINES','WELLAND'))
View(tmpsmall9)
这没有按预期执行。它留下 121 行。 Niagara Falls 仍然有 7 行,St. Catharines 有 8 行,Welland 有 7 行。其他因素(即
variable
)都有它们的所有行。
什么是比我的迭代解决方案更有效和准确的子集因子方法?可以修改我的第二个解决方案吗? (如果没有,有人能解释为什么这第二个
subset()
语法只删除了这三个因素的some观察,而不是全部吗?逻辑运算符是否一次只能处理一个因素?)
一个选项是否定
%in%
运算符,例如
library(data.table)
tmp <- structure(list(Year = c(1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021, 1971, 1976, 1981, 1986, 1991, 1996, 2001,
2006, 2011, 2016, 2021), variable = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L), .Label = c("FORT ERIE", "GRIMSBY",
"LINCOLN", "NIAGARA FALLS", "NIAGARA-ON-THE-LAKE", "PELHAM",
"PORT COLBORNE", "ST. CATHARINES", "THOROLD", "WAINFLEET", "WELLAND",
"WEST LINCOLN"), class = "factor"), value = c(23113L, 24030L,
24096L, 23253L, 26006L, 27183L, 28143L, 29925L, 29960L, 30710L,
32901L, 15770L, 15565L, 15797L, 16956L, 18520L, 19585L, 21297L,
23937L, 25325L, 27314L, 28883L, 14247L, 14460L, 14196L, 14391L,
17149L, 18801L, 20612L, 21722L, 22487L, 23787L, 25719L, 67163L,
69420L, 70960L, 72107L, 75399L, 76917L, 78815L, 82184L, 82997L,
88071L, 94415L, 12552L, 12485L, 12186L, 12494L, 12945L, 13238L,
13839L, 14587L, 15400L, 17511L, 19090L, 9997L, 10070L, 11104L,
12137L, 13328L, 14343L, 15272L, 16155L, 16598L, 17110L, 18192L,
21420L, 20535L, 19225L, 18281L, 18766L, 18451L, 18450L, 18599L,
18424L, 18306L, 20033L, 109722L, 123350L, 124018L, 123455L, 129300L,
130926L, 129170L, 131989L, 131400L, 133113L, 136803L, 15065L,
14945L, 15412L, 16131L, 17542L, 17883L, 18048L, 18224L, 17931L,
18801L, 23816L, 5486L, 6065L, 6000L, 5955L, 6203L, 6253L, 6258L,
6601L, 6356L, 6372L, 6887L, 44397L, 45050L, 45448L, 45054L, 47914L,
48411L, 48402L, 50331L, 50631L, 52293L, 55750L, 8396L, 9460L,
9846L, 9918L, 10864L, 11513L, 12268L, 13167L, 13837L, 14500L,
15454L)), row.names = c(NA, -132L), class = c("data.table", "data.frame"))
tmpsmall9 <- subset(tmp, variable != 'NIAGARA FALLS') # subset excludes NF
tmpsmall9 <- subset(tmpsmall9, variable != 'ST. CATHARINES') # subset excludes ST. KITTY
tmpsmall9 <- subset(tmpsmall9, variable != "WELLAND") # subset excludes Welland
tmpsmall9_ver2 <- tmp[!(variable %in% c('NIAGARA FALLS','ST. CATHARINES','WELLAND'))]
all.equal(tmpsmall9, tmpsmall9_ver2)
#> [1] TRUE
创建于 2023-05-05 与 reprex v2.0.2