我在R中遇到多个条件的问题。我的数据是这样的:
Region in UK Year Third column (year.city)
Liverpool 2008
Manchester 2010
Liverpool 2016
Chester 2015
Birmingham 2016
Blackpool 2012
Birmingham 2005
Chester 2009
Liverpool 2005
Hull 2011
Leeds 2013
Liverpool 2014
Bradford 2008
London 2010
Coventry 2009
Cardiff 2016
Liverpool 2007
我想要创建的是第三列,其中包含其中的群体:2010年之前的利物浦,2010年之后的利物浦,2010年之前的其他城市,2010年之后的其他城市。我尝试了几个代码,如mutate但无法解决它。你能帮帮我吗?谢谢
我会这样做@dvibisan建议并使用dplyr。
# Create a dataframe
df <- structure(list(`Region in UK` = c("Liverpool", "Manchester", "Liverpool",
"Chester", "Birmingham", "Blackpool", "Birmingham", "Chester",
"Liverpool", "Hull", "Leeds", "Liverpool", "Bradford", "London",
"Coventry", "Cardiff", "Liverpool"),
Year = c(2008L, 2010L, 2016L, 2015L, 2016L, 2012L, 2005L, 2009L, 2005L, 2011L, 2013L, 2014L, 2008L, 2010L, 2009L, 2016L, 2007L)),
row.names = c(NA, -17L), class = c("data.table", "data.frame"))
# Load the dplyr library to use mutate and if_else (if there were more than 2 conditions of interest for each variable could use case_when)
library(dplyr)
# Create a new column using mutate, pasting together two conditions
df <-
df %>%
mutate(`Third column (year.city)` = paste0(if_else(grepl("Liverpool", `Region in UK`, fixed = TRUE), `Region in UK`, "Other cities"),
if_else(Year < 2010, " before 2010", " 2010 or after")))
我认为最简单的方法是使用基数R的矢量化:
# create index of categories
vec <- c("Other cities after 2010", "Liverpool after 2010", "Other cities before 2010", "Liverpool before 2010")
# create index vector
ix <- 1 + (df$Region.in.UK == "Liverpool") + 2*(df$Year < 2010)
# index the categories-vector with the index-vector
df$year.city <- vec[ix]
结果:
> df
Region.in.UK Year year.city
1 Liverpool 2008 Liverpool before 2010
2 Manchester 2010 Other cities after 2010
3 Liverpool 2016 Liverpool after 2010
4 Chester 2015 Other cities after 2010
5 Birmingham 2016 Other cities after 2010
6 Blackpool 2012 Other cities after 2010
7 Birmingham 2005 Other cities before 2010
8 Chester 2009 Other cities before 2010
9 Liverpool 2005 Liverpool before 2010
10 Hull 2011 Other cities after 2010
11 Leeds 2013 Other cities after 2010
12 Liverpool 2014 Liverpool after 2010
13 Bradford 2008 Other cities before 2010
14 London 2010 Other cities after 2010
15 Coventry 2009 Other cities before 2010
16 Cardiff 2016 Other cities after 2010
17 Liverpool 2007 Liverpool before 2010
试试这个
Region_in_UK = c( "Liverpool", "Manchester", "Liverpool", "Chester",
"Birmingham", "Blackpool", "Birmingham", "Chester", "Liverpool", "Hull",
"Leeds", "Liverpool", "Bradford", "London", "Coventry", "Cardiff", "Liverpool")
Year = c(2008, 2010, 2016, 2015, 2016, 2012, 2005, 2009, 2005, 2011, 2013,
2014, 2008, 2010, 2009, 2016, 2007)
df = data.frame(Region_in_UK, Year)
# erase the code above and replace your own dataframe if its bigger
# than the data you displayed at this point and name it "df" (e.g.:
# df = your_dataframe)
df$year_city = rep(NA, dim(df)[1])
df = mutate(df, year_city =
ifelse (grepl("Liverpool", df$Region_in_UK) & df$Year < 2010,
"Liverpool before 2010", df$year_city))
df = mutate(df, year_city =
ifelse (grepl("Liverpool", df$Region_in_UK) & df$Year >= 2010,
"Liverpool 2010 and after", df$year_city))
df = mutate(df, year_city =
ifelse (!grepl("Liverpool", df$Region_in_UK) & df$Year < 2010,
"Other before 2010", df$year_city))
df = mutate(df, year_city =
ifelse (!grepl("Liverpool", df$Region_in_UK) & df$Year >= 2010,
"Other 2010 and after", df$year_city))
使用base R你可以做到:
transform(df, year.city = factor(paste(sub('^((?!Liver).)*$', 'other', Region_in_UK,perl = TRUE), Year>2010), label=1:4))
Region_in_UK Year year.city
1 Liverpool 2008 1
2 Manchester 2010 3
3 Liverpool 2016 2
4 Chester 2015 4
5 Birmingham 2016 4
6 Blackpool 2012 4
7 Birmingham 2005 3
8 Chester 2009 3
9 Liverpool 2005 1
10 Hull 2011 4
11 Leeds 2013 4
12 Liverpool 2014 2
13 Bradford 2008 3
14 London 2010 3
15 Coventry 2009 3
16 Cardiff 2016 4
17 Liverpool 2007 1
你也可以这样做:
transform(df,m=factor(paste(!grepl("Liverpool",Region_in_UK),Year>2010),label=1:4))
要么
transform(df,m = factor(paste(sub('(Liverpool)|.*','\\1',Region_in_UK),Year<=2010),label=4:1))
Region_in_UK Year m
1 Liverpool 2008 1
2 Manchester 2010 3
3 Liverpool 2016 2
4 Chester 2015 4
5 Birmingham 2016 4
6 Blackpool 2012 4
7 Birmingham 2005 3
8 Chester 2009 3
9 Liverpool 2005 1
10 Hull 2011 4
11 Leeds 2013 4
12 Liverpool 2014 2
13 Bradford 2008 3
14 London 2010 3
15 Coventry 2009 3
16 Cardiff 2016 4
17 Liverpool 2007 1