R如何基于先前的值过滤测量的时间序列

问题描述 投票:0回答:1

我正在尝试按时间序列过滤珊瑚人口统计数据。我有一组每3个月进行一次测量的珊瑚。我想做的是a。)对所有在某个点具有最大直径在指定尺寸范围(直径8-12 mm)内的珊瑚进行过滤,b。)去除先前大于该尺寸范围的珊瑚,然后c 。)移除落入尺寸范围后的珊瑚尺寸,方法是仅对每条珊瑚包括尺寸增长到尺寸范围(8-12毫米)的第一个尺寸,然后在下一个尺寸中进行测量TimeStep。

我创建了一个示例数据库和所需的数据库,以专门说明我在寻找什么。在样本数据库中,我还在注释部分下面列出了每种珊瑚列出的所有标准,紧随每种珊瑚的第一个条目旁边,以供您参考。这是我已包含在数据库中的8种珊瑚,我想用这些语言完成它们:

应该将珊瑚#1完全从数据库中删除,因为它跳过了所需的8-12毫米大小范围

Coral#2应该从数据库中删除,因为它开始于所需的大小范围之上,然后缩小至其之下,然后逐渐增大。我只希望珊瑚长到尺寸范围而不会事先缩小

Coral#3是一个珊瑚的示例,该珊瑚生长到大小范围(8-12毫米)甚至更大,并且没有收缩,这是我想要保留的珊瑚,因为它生长到了大小范围。但是,我只想在尺寸范围(TimeStep 3中为9毫米)和后续测量(TimeStep 4中为12毫米)的范围内包括FIRST度量。

Coral#4是珊瑚的一个示例,该珊瑚起初并保持在大小范围内,因此需要将其移除。

Coral#5是一个示例,该示例从低于范围的珊瑚开始,长到该范围,然后缩小到该范围(TimeStep 4)。对于这种情况,我只想包括第一次直径落入该范围(TimeStep 2)和进行测量(TimeStep 3),而不是第二次落入该范围。这是因为第一次是自然增长,而第二次是萎缩及其带来的恢复(我想排除或过滤掉)。

Coral#6是一个珊瑚的示例,该珊瑚从TimeStep 1的大小范围开始,然后在下一个TimeStep中从中生长出来,然后继续生长。我只想保留TimeStep 1和2中的测量值(范围内的第一个测量值和正在进行的测量)

[Coral#7是一个珊瑚的示例,该珊瑚在TimeStep 1的尺寸范围内开始,然后一直停留在TimeStep 2的范围内。在这种情况下,我只希望在尺寸范围内(TimeStep 1)进行第一次测量,然后再进行一次测量。测量(TimeStep 2)

[Coral#8是在TimeStep 3中增长到大小范围,在TimeStep 4中保持在(10 => 9)范围内,然后收缩到所需范围以下,然后对于TimeStep 6恢复到该范围的珊瑚的示例。 。对于这个殖民地,我再次希望在该范围内进行首次测量(在TimeStep 3处为10毫米),并在TimeStep 4中进行该珊瑚的继续测量]

总而言之,我想要过滤此数据库的代码,以便如果某个点处的珊瑚直径在8-12厘米范围内,但以前大于该范围,则永远不会达到或低于该范围,或者从以下开始范围,并且永远不会落入该范围内,因此它们将从数据库中完全删除。另外,我希望保留所有生长到该范围的珊瑚,然后在数据库中缩小到它的位置,同时删除第二次落入该范围的珊瑚。通过删除除珊瑚生长到尺寸范围内的第一个TimeStep之外的所有测量值以及随后的TimeStep测量值,可以完成此操作。

样本数据库
data <- structure(list(Site = c("WAI", "WAI", "WAI", "WAI", "WAI", "WAI", 
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", 
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", 
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", 
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", 
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI"), `Module #` = c(116, 
116, 116, 116, 116, 116, 116, 115, 115, 116, 116, 116, 116, 116, 
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 
116, 116, 116, 116, 116, 116, 116, 116), Side = c("N", "N", "N", 
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", 
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", 
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", 
"N", "N", "N", "N", "N", "N"), TimeStep = c(1, 2, 3, 4, 5, 6, 
1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 
4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6
), Settlement_Area = c(0.75902336, 0.75902336, 0.75902336, 0.75902336, 
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 
0.75902336, 0.75902336), `Colony #` = c(1, 1, 1, 1, 1, 1, 2, 
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 
5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8), 
    Location = c("C1", "C1", "C1", "C1", "C1", "C1", "B4", "B4", 
    "B4", "B4", "B4", "B4", "A1", "A1", "A1", "A1", "A1", "A1", 
    "B3", "B3", "B3", "B3", "B3", "B3", "D1", "D1", "D1", "D1", 
    "D1", "D1", "A2", "A2", "A2", "A2", "A2", "A2", "A4", "A4", 
    "A4", "A4", "A4", "A4", "B3", "B3", "B3", "B3", "B4", "B5"
    ), `Taxonomic Code` = c("PC", "PC", "PC", "PC", "PC", "PC", 
    "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", 
    "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", 
    "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", 
    "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", 
    "PC", "PC"), `Cover Code` = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1), `Max Diameter (cm)` = c(5, 7, 13, 15, 16, 19, 15, 7, 
    9, 11, 14, 18, 3, 6, 9, 12, 15, 20, 13, 16, 18, 21, 23, 26, 
    6, 9, 14, 12, 15, 18, 11, 14, 17, 17, 21, 24, 9, 11, 14, 
    16, 20, 22, 3, 6, 10, 9, 7, 10), Notes = c("coral # 1 should be deleted from the database because it skipped over the desired size range of 8-12 mm", 
    NA, NA, NA, NA, NA, "coral # 2 should be deleted from the database because it started above the desired size range then shrank back into it.  I only want corals that have grown into the size range", 
    NA, NA, NA, NA, NA, "Colony # 3 is an example of a coral that grew to the size range (8-12 mm) and beyond without shrinking and this is a coral that I want to keep because it grew to the size range.  However, I want to only include the FIRST measure inside the size range (9 mm in this case) and the proceeding measurement (12 mm)", 
    NA, NA, NA, NA, NA, "Colony # 4 is an example of a coral that started off above the size range and therefore needs to be removed.", 
    NA, NA, NA, NA, NA, "Colony # 5 is an example of a coral that started below the range, grew into it, then later shrank back into the range (TimeStep 4). For this scenario, I want to only include the first time the diameter fell into the range (TimeStep 2) and the proceeding measurement, not the second time it fell into the range. This is because the first time is natural growth whereas the second time is shrinkage and its resulting recovery (which I want to exclude or filter out).", 
    NA, NA, NA, NA, NA, "Colony # 6 is an example of a coral that started in the size range for TimeStep 1 and then grew out of it in the next TimeStep and continued to grow after. I want to maintain only the measurements in TimeStep 1 and 2 (the first measure inside the range and the proceeding measurement)", 
    NA, NA, NA, NA, NA, "Colony # 7 is an example of a coral that started in the size range in TimeStep 1 and then remained in the range for TimeStep 2. In this case I only want the first measurement in the size range (TimeStep 1) and the subsequent measurement (TimeStep 2)", 
    NA, NA, NA, NA, NA, "Colony # 8 is an example of a coral that grew to the size range in TimeStep 3, stayed in the range (10 => 9) in TimeStep 4, then shrank below the desired range then for TimeStep 6 grew back to the range. For this colony, again I want the FIRST measurement inside the range (10 mm at TimeStep 3) and the proceeding measurement in TimeStep 4 included for this coral", 
    NA, NA, NA, NA, NA)), class = c("spec_tbl_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -48L), spec = structure(list(
    cols = list(Site = structure(list(), class = c("collector_character", 
    "collector")), `Module #` = structure(list(), class = c("collector_double", 
    "collector")), Side = structure(list(), class = c("collector_character", 
    "collector")), TimeStep = structure(list(), class = c("collector_double", 
    "collector")), Settlement_Area = structure(list(), class = c("collector_double", 
    "collector")), `Colony #` = structure(list(), class = c("collector_double", 
    "collector")), Location = structure(list(), class = c("collector_character", 
    "collector")), `Taxonomic Code` = structure(list(), class = c("collector_character", 
    "collector")), `Cover Code` = structure(list(), class = c("collector_double", 
    "collector")), `Max Diameter (cm)` = structure(list(), class = c("collector_double", 
    "collector")), Notes = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

所需数据库
data_final <- structure(list(Site = c("WAI", "WAI", "WAI", "WAI", "WAI", "WAI", 
"WAI", "WAI", "WAI", "WAI"), `Module #` = c(116, 116, 116, 116, 
116, 116, 116, 116, 116, 116), Side = c("N", "N", "N", "N", "N", 
"N", "N", "N", "N", "N"), TimeStep = c(3, 4, 2, 3, 1, 2, 1, 2, 
3, 4), Settlement_Area = c(0.75902336, 0.75902336, 0.75902336, 
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 
0.75902336), `Colony #` = c(3, 3, 5, 5, 6, 6, 7, 7, 8, 8), Location = c("A1", 
"A1", "D1", "D1", "A2", "A2", "A4", "A4", "B3", "B3"), `Taxonomic Code` = c("PC", 
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC"), `Cover Code` = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1), `Max Diameter (cm)` = c(9, 12, 9, 
14, 11, 14, 9, 11, 10, 9), Notes = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), spec = structure(list(cols = list(
    Site = structure(list(), class = c("collector_character", 
    "collector")), `Module #` = structure(list(), class = c("collector_double", 
    "collector")), Side = structure(list(), class = c("collector_character", 
    "collector")), TimeStep = structure(list(), class = c("collector_double", 
    "collector")), Settlement_Area = structure(list(), class = c("collector_double", 
    "collector")), `Colony #` = structure(list(), class = c("collector_double", 
    "collector")), Location = structure(list(), class = c("collector_character", 
    "collector")), `Taxonomic Code` = structure(list(), class = c("collector_character", 
    "collector")), `Cover Code` = structure(list(), class = c("collector_double", 
    "collector")), `Max Diameter (cm)` = structure(list(), class = c("collector_double", 
    "collector")), Notes = structure(list(), class = c("collector_logical", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))

到目前为止,通过创建一个落在8到12毫米内的独特菌落数矢量,我已经能够获得尺寸范围内从未有过的珊瑚:

ID_vect <- data %>% group_by(`Colony #`) %>% 
filter(`Max Diameter (cm)` > min(size_vect) & `Max Diameter (cm)` < max(size_vect)) %>% 
# select all measures where the coral fell within the size range
distinct(`Colony #`) %>% 
# remove duplicate colony numbers
pull(`Colony #`)
# make the column `Colony #` in the dataframe ID_vect into a vector

然后我过滤了完整的样本数据库,只包括ID_vect中的珊瑚群落:

data_new <- data %>% group_by(`Colony #`) %>%
filter(`Colony #` %in% ID_vect) 
# filter for all corals that contain the same colony number as those in the ID_vect

我不知道现在如何根据以下条件过滤数据库:如果珊瑚在某个时候落入尺寸范围,但先前的测量值大于所需尺寸范围的最大值(12毫米),则珊瑚应彻底清除。例如,应该移除珊瑚#2,因为在该值落入TimeStep 3中的范围之前,它在TimeStep 1中为15毫米,超出了范围。我不知道如何编写条件过滤器的代码,这是我需要帮助的地方。任何代码建议表示赞赏,谢谢!

我正在尝试按时间序列过滤珊瑚人口统计数据。我有一组每3个月进行一次测量的珊瑚。我想做的是a。)过滤所有在某个时刻具有...

r filter dplyr time-series measurement
1个回答
0
投票

我们可以使用游程长度编码来帮助我们跟上从范围内到范围外的过渡。使用data.table::rleid会容易得多,我建议您使用它。

© www.soinside.com 2019 - 2024. All rights reserved.