我对R和脚本编程的了解几乎不存在。所以我希望你能对这个基本问题保持耐心。
library(lubridate)
date.depature <- c("2016.06.16", "2016.11.16", "2017.01.05", "2017.01.12", "2017.02.25")
airport.departure <- c("CDG", "QNY", "QXO", "CDG", "QNY")
airport.arrival <- c("SYD", "CDG", "QNY", "SYD", "QXO")
amount <- c("1", "3", "1", "10", "5")
date.depature <- as_date(date.depature)
df <- data.frame(date.depature, airport.departure, airport.arrival, amount)
xtabs(as.integer(amount) ~ airport.arrival + airport.departure, df)
使用此代码,我们得到金额的总和作为矩阵,机场为行/列。现在我只需要结果
由于你已经在使用lubridate
,我将向你展示一种使用dplyr
(tidyverse
和lubridate一起使用的一部分)的方法。
解决方案都适用。 filter
和month
,year
和as_date
函数从lubridate
创建条件来过滤你的数据,然后使用pipe %>%
传递那个长到xtabs
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
date.depature <- c("2016.06.16", "2016.11.16", "2017.01.05", "2017.01.12", "2017.02.25")
airport.departure <- c("CDG", "QNY", "QXO", "CDG", "QNY")
airport.arrival <- c("SYD", "CDG", "QNY", "SYD", "QXO")
amount <- c("1", "3", "1", "10", "5")
date.depature <- as_date(date.depature)
df <- data.frame(date.depature, airport.departure, airport.arrival, amount)
# For 2017
df %>%
filter(year(date.depature) == 2017) %>%
xtabs(as.integer(amount) ~ airport.arrival + airport.departure, .)
#> airport.departure
#> airport.arrival CDG QNY QXO
#> CDG 0 0 0
#> QNY 0 0 1
#> QXO 0 4 0
#> SYD 2 0 0
# 2017.01
df %>%
filter(year(date.depature) == 2017, month(date.depature) == 1) %>%
xtabs(as.integer(amount) ~ airport.arrival + airport.departure, .)
#> airport.departure
#> airport.arrival CDG QNY QXO
#> CDG 0 0 0
#> QNY 0 0 1
#> QXO 0 0 0
#> SYD 2 0 0
# until 2017.01
df %>%
filter(date.depature <= as_date("2017.01.01")) %>%
xtabs(as.integer(amount) ~ airport.arrival + airport.departure, .)
#> airport.departure
#> airport.arrival CDG QNY QXO
#> CDG 0 3 0
#> QNY 0 0 0
#> QXO 0 0 0
#> SYD 1 0 0
由reprex package创建于2018-11-19(v0.2.1)
你创建amount
时为什么不强迫"integer"
上课df
?只是摆脱双引号
amount <- c("1", "3", "1", "10", "5")
要么
amount <- as.integer(c("1", "3", "1", "10", "5"))
这是因为as.integer(df$amount)
没有回来
c(1, 3, 1, 10, 5)
当您创建数据框df
时,该向量被强制转换为类"factor"
,而您现在拥有的是
as.integer(df$amount)
#[1] 1 3 1 2 4
正确的方法是
as.integer(as.character(df$amount))
#[1] 1 3 1 10 5
或者更简单:
date.depature <- c("2016.06.16", "2016.11.16", "2017.01.05", "2017.01.12", "2017.02.25")
airport.departure <- c("CDG", "QNY", "QXO", "CDG", "QNY")
airport.arrival <- c("SYD", "CDG", "QNY", "SYD", "QXO")
amount <- c(1, 3, 1, 10, 5)
date.depature <- as_date(date.depature)
df <- data.frame(date.depature, airport.departure, airport.arrival, amount)
现在的问题。
这基本上是一个子集问题。
子集提取所需年份和月份的数据,然后运行相同的xtabs
命令。
df1 <- df[year(df$date.depature) == 2017, ]
df2 <- df1[month(df1$date.depature) == 1, ]
df3 <- cbind(df[year(df$date.depature) < 2017, ], df2)
现在xtabs
,上面的子数据帧。
xtabs(amount ~ airport.arrival + airport.departure, df1)
xtabs(amount ~ airport.arrival + airport.departure, df2)
xtabs(amount ~ airport.arrival + airport.departure, df3)
您需要在xtabs调用中对date.departure进行子集化。年== 2017年:
xtabs(as.integer(amount) ~ airport.arrival + airport.departure, df[year(df$date.depature)==2017,])
对于年份== 2017年和月份== 1:
xtabs(as.integer(amount) ~ airport.arrival + airport.departure, df[year(df$date.depature)==2017 & month(df$date.departure)==1,])
2017年1月之前的任何事情:
xtabs(as.integer(amount) ~ airport.arrival + airport.departure, df[df$date.depature<as_date("2017-01-01"),])