我的数据是通过遥测定位的约40只动物(id),我已经规定了3个区域。第一个是AR
,这里是繁殖区域,AM
迁移,AA
是饲养区域。所有动物的第一个位置都在AR
。但是有时动物还处于繁殖期(在AR
),但是可以几次出到AM
,然后又回到AR
。只有当动物只有AM
时,它们才开始迁移,直到到达饲养区域AA
。因此,它们从AR
开始,然后开始迁移AM
,然后到达进料区域AA
。
我正在尝试创建一些我不知道该怎么做的条件的新列,例如,我有此数据框
id area
2304 AR
2304 AR
2304 AR
2304 AM #this AM for example, can repeat until 20 times and then came back to AR
2304 AM
2304 AR
2304 AR
2304 AR
2304 AM
2304 AM
2304 AM
2304 AM
2304 ...
2304 AM
2304 AM
2304 AM
2304 AA
2304 AA
2304 ...
2304 AA
因此,当有AR次x且此后有一个或直到20 AM并返回有AR时,我想要一个带有AR的新列。当有AM x次并且只有AM,而又没有回到AR时,我想要带有AM的新列。像这样:
AA没问题,AA =始终是AA
我期望这样:
id area fixed_area
2304 AR AR
2304 AR AR
2304 AR AR
2304 AM AR #this AM for example, can repeat until 20 times and then came back to AR
2304 AM AR
2304 AR AR
2304 AR AR
2304 AR AR
2304 AM AM
2304 AM AM
2304 AM AM
2304 AM AM
2304 ... ...
2304 AM AM
2304 AM AM
2304 AM AM
2304 AA AA
2304 AA AA
2304 ... ...
2304 AA AA
我尝试过:
但是缺少AA
,也许问题是因为需要对每只动物进行这种分离(id)
> table(df$area)
AA AM AR
31460 39101 28820
class(df$area)
[1] "character"
> idx <- with(rle(as.character(df$area)), rep(seq_along(lengths), lengths))
> df$fixed_area <- with(df, replace(area, idx < max(idx[area == 'AM']), 'AR'))
> table(df$fixed_area)
AM AR
145 99236
>
此后,我放入数据帧,但是我的数据帧有90.000多行,所以我只复制了头值
> dput(head(df))
structure(list(DeployID = c("111868_16", "111868_16", "111868_16",
"111868_16", "111868_16", "111868_16"), Start = structure(c(1477323868,
1477323946, 1477324002, 1477324044, 1477324260, 1477324480), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), End = structure(c(1477323944, 1477324000,
1477324042, 1477324170, 1477324458, 1477324542), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), What = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("Dive", "Message", "Surface"), class = "factor"),
Shape = structure(c(2L, 4L, 3L, 2L, 2L, 2L), .Label = c("",
"Square", "U", "V"), class = "factor"), DepthMean = c(14.5,
16.5, 13, 14.5, 11, 12.5), DurationMean = c(76, 54, 40, 126,
198, 62), DepthMin = c(14.5, 16.5, 13, 14.5, 11, 12.5), DepthMax = c(14.5,
16.5, 13, 14.5, 11, 12.5), depth_range = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("shallow", "deep"), class = c("ordered",
"factor")), MidTime = structure(c(1477323906, 1477323973,
1477324022, 1477324107, 1477324359, 1477324511), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), year = c(2016, 2016, 2016, 2016,
2016, 2016), id = c("111868_16", "111868_16", "111868_16",
"111868_16", "111868_16", "111868_16"), segmentid = c("111868_16",
"111868_16", "111868_16", "111868_16", "111868_16", "111868_16"
), mu.x = c(-4446545.25191192, -4446557.10576816, -4446565.77504969,
-4446580.81370994, -4446625.40007808, -4446652.29459533),
mu.y = c(-2305423.86124176, -2305461.88537725, -2305489.69364377,
-2305537.93137917, -2305680.93056743, -2305767.17264774),
lon = c(-39.9439956132156, -39.944102098218, -39.944179975699,
-39.9443150702825, -39.9447155964422, -39.9449571940013),
lat = c(-20.3985940756941, -20.3989161274532, -20.3991516537744,
-20.3995602097098, -20.4007713539709, -20.4015017842338),
lq_closest_filt = c(7L, 7L, 7L, 7L, 7L, 7L), dt_closest_filt = c(0.0516666666666667,
0.0702777777777778, 0.0838888888888889, 0.1075, 0.1775, 0.219722222222222
), dist_closest_filt = c(0.103680210832692, 0.141026573116106,
0.168339162761167, 0.215717097671267, 0.356168027785347,
0.440874049523752), rel.angle = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), speed = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), depth_bin = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("(0,50]", "(50,100]", "(100,150]",
"(150,200]", "(200,250]", "(250,300]", "(300,350]", "(350,400]",
"(400,450]", "(450,500]", "(500,550]", "(550,600]", "(600,650]",
"(650,700]"), class = "factor"), bat = structure(list(depth = c(-59L,
-59L, -59L, -59L, -59L, -59L)), row.names = c(NA, 6L), class = "data.frame"),
area = c("AR", "AR", "AR", "AR", "AR", "AR")), row.names = c(NA,
6L), class = "data.frame")
有人知道该如何解决?谢谢!
听起来您可能希望采用一些规则来决定哪些AM
行变为AR
。
AM
的数量是<20AA
一种方法是使用rle
添加与这两个规则相关的列。对于重复序列中的连续值数,一列将具有lengths
。另一列将具有“下一个”区域。这对于确定目的地是返回繁殖区还是返回饲养区至关重要。
最后,您可以使用条件语句,并将满足以下条件的行从AM
更改为AR
:
area
为AM
area
是不是 AA
这里是代码:
df2 <- cbind(df, next_area = with(rle(df$area), rep(c(values[-1], NA), lengths)), count = with(rle(df$area), rep(lengths, lengths)))
df2$area <- ifelse(with(df2, area == "AM" & next_area != "AA" & count < 20), "AR", df2$area)