我正在进行纱架调查,并尝试构建一个随机的日期生成器,该周末周末的权重高于工作日的权重。到目前为止,我有一个不考虑日期类型的简单化的随机日期生成器。我们预计周末会有更大的压力(因为那时候会有更多人有时间钓鱼),但没有办法选择不包含偏见的随机日。我想选择给定月份内的15天。
我已经生成了一个简单的随机日期生成器:
dates <- data.frame(seq.Date(as.Date(day.start),as.Date(day.end),by="day"))
dates
sample(dates$seq.Date.as.Date.day.start...as.Date.day.end...by....day.., size = 15, replace = FALSE)
[1] "2019-11-10" "2019-11-06" "2019-11-04" "2019-11-27" "2019-11-30" "2019-11-15"
[7] "2019-11-18" "2019-11-21" "2019-11-13" "2019-11-01" "2019-11-19" "2019-11-25"
[13] "2019-11-07" "2019-11-02" "2019-11-23"
理想情况下,我将拥有一个最终产品,使我可以输入月份的开始和结束并随机输出15天。
这里是一个somewhat常规功能,可以满足您的需求。它以开始日期,结束日期和您要在周末放置的权重(相对于1)作为自己的参数,并将其他附加参数(size
,replace
等)传递给[C0 ]。除基数R外没有其他依赖项。
但是,如果采样时没有更换,则可能要按照Jan van der Laan的答案中的建议使用sample
软件包。
sampling
下面代码中注释的说明:
rday = function(
start_day = as.Date("2019-01-01"),
end_day = as.Date("2019-01-31"),
weekend_weight = 2,
...
) {
if (! "Date" %in% class(start_day)) start_day = as.Date(start_day)
if (! "Date" %in% class(end_day)) end_day = as.Date(end_day)
dates = seq(start_day, end_day, by = "1 day")
weights = rep(1, length(dates))
weights[weekdays(dates) %in% c("Saturday", "Sunday")] = 1
sample(dates, ..., prob = weights)
}
rday(size = 15)
# [1] "2019-01-24" "2019-01-07" "2019-01-21" "2019-01-15" "2019-01-27" "2019-01-04" "2019-01-30" "2019-01-12"
# [9] "2019-01-11" "2019-01-08" "2019-01-20" "2019-01-01" "2019-01-03" "2019-01-19" "2019-01-29"
关于我不使用# Generate initial data; as in question
day_start <- as.Date("2010-10-01")
day_end <- as.Date("2010-10-31")
dates <- data.frame(date = seq.Date(day_start, day_end,by="day"))
# Determine inclusion probabilities for each date; give weekend a higher
# probability.
dates$day <- as.numeric(format(dates$date, "%u"))
dates$psamp <- ifelse(dates$date >= 6, 0.2, 0.1)
# Make sure probabilites add up to requires sample size
samplesize <- 15
dates$psamp <- dates$psamp * 15/sum(dates$psamp)
# Do not use sample for sampling without replacement with unequal probabilities!
# The sampling package has a large number of routines for sampling without
# replacement and unequal probabilites. The following gives a fixed size sample
# (sum dates$psamp)
library(sampling)
dates$selected <- UPrandomsystematic(dates$psamp)
的原因,请参见sample
。