比较两列的条件

问题描述 投票:0回答:1

我有一个具有四列的数据框,第一个具有县名,第二个具有县名,第三个具有实际测量值(IPC类),第四个具有预测值(预测)在里面。实际值和预测值的范围都在1到5之间。这些是按县排序的数据框的前32行。:

structure(list(County = c("Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo", "Baringo", "Baringo", 
"Baringo", "Baringo", "Baringo", "Baringo"), `Period of measurement Kenya` = c("2011-01", 
"2011-04", "2011-07", "2011-10", "2012-01", "2012-04", "2012-07", 
"2012-10", "2013-01", "2013-04", "2013-07", "2013-10", "2014-01", 
"2014-04", "2014-07", "2014-10", "2015-01", "2015-04", "2015-07", 
"2015-10", "2016-02", "2016-06", "2016-10", "2017-02", "2017-06", 
"2017-10", "2018-02", "2018-06", "2018-10", "2018-12", "2019-02", 
"2019-06"), `IPC class` = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 3, 2, 1, 1, 1, 1, 1, 2
), Forecast = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 3, 1, 1, 1, 1, 2, 1)), row.names = c(1L, 
48L, 95L, 142L, 189L, 236L, 283L, 330L, 377L, 424L, 471L, 518L, 
565L, 612L, 659L, 706L, 753L, 800L, 847L, 894L, 941L, 988L, 1035L, 
1082L, 1129L, 1176L, 1223L, 1270L, 1317L, 1364L, 1411L, 1458L
), class = "data.frame") 

因此,对于我的报告,我需要知道我正在研究的时期内发生了多少次危机过渡以及有多少次错误预测的危机过渡。危机转移是指“实际值”列中的值从1或2变为3,4或5。在数据框的一部分中,您可以看到Baringo县发生了1次危机转移。为了计算这一点,使用了以下代码:

SUB_count_cristrans_KE <- long.SUB_dfCSKE_tot %>% mutate(crisis = ifelse(`IPC class` %in% 3:5, 1, 0)) %>%
  arrange(County, `Period of measurement Kenya`) %>%
  group_by(County) %>%
  summarize(SUB_crisis_trans_count = sum(diff(crisis) > 0))

误认为危机过渡是指在发生危机过渡时,预测列与IPC类列的显示值不同。正如您在数据框的一部分中看到的,由于“预测”列中的值不是3、4或5,所以巴林哥的危机过渡是错误预测的。所以我的问题是:[C0中的正确条件是什么]功能是否可以按县减去错误的危机时期?换句话说:首先,它必须检查某个时期是否是危机过渡期,以便使其从1或2变为3,4或5。如果是这种情况,则预测列中的值是3 ,4或5。如果不是这种情况,那就是预料不到的危机过渡。我现在拥有的代码是:

ifelse

让我知道是否需要添加或澄清!预先感谢。

[下面我突出显示了加里萨郡(Garissa),以便更清楚地说明我想解决的问题或想要达到的目标。 ;)

SUB_count_crismiss_KE <- long.SUB_dfCSKE_tot %>% mutate(crisis_miss = ifelse(`IPC class` %in% 3:5 & (!Forecast %in% 3:5), 1, 0)) %>%
  arrange(County, `Period of measurement Kenya`) %>%
  group_by(County) %>%
  summarize(SUB_crisis_miss_count_KE = sum(diff(crisis_miss) > 0))

[2011-04年至2011-07年之间发生了危机过渡; IPC值从2变为3。但是,在2011-07到2011-10期间没有发生危机过渡,因为IPC值保持在3。所以现在到了错误预测的部分。对上述时期之间的危机过渡进行了适当的预测;预测值为3、4或5。2011-10的预测值不正确,但是由于没有危机过渡,因此不应计算该值。那么,如何才能在不发生危机过渡的情况下跳过预测值呢?我希望现在更加清楚。

加里萨郡的dput子集:

> subset(sorted_long.SUB_dfCSKE_tot, County=="Garissa")
      County Period of measurement Kenya IPC class Forecast
7    Garissa                     2011-01         2        3
54   Garissa                     2011-04         2        2
101  Garissa                     2011-07         3        3
148  Garissa                     2011-10         3        2
195  Garissa                     2012-01         2        2
242  Garissa                     2012-04         2        2
289  Garissa                     2012-07         3        3
336  Garissa                     2012-10         3        2
383  Garissa                     2013-01         2        2
430  Garissa                     2013-04         2        2
477  Garissa                     2013-07         2        2
524  Garissa                     2013-10         2        2
571  Garissa                     2014-01         2        2
618  Garissa                     2014-04         2        2
665  Garissa                     2014-07         2        2
712  Garissa                     2014-10         3        2
759  Garissa                     2015-01         3        2
806  Garissa                     2015-04         3        2
853  Garissa                     2015-07         2        2
900  Garissa                     2015-10         2        2
947  Garissa                     2016-02         2        2
994  Garissa                     2016-06         2        2
1041 Garissa                     2016-10         2        2
1088 Garissa                     2017-02         3        2
1135 Garissa                     2017-06         3        3
1182 Garissa                     2017-10         2        3
1229 Garissa                     2018-02         3        2
1276 Garissa                     2018-06         1        3
1323 Garissa                     2018-10         1        1
1370 Garissa                     2018-12         2        1
1417 Garissa                     2019-02         2        2
1464 Garissa                     2019-06         2        2
r if-statement
1个回答
0
投票

我现在创建了一个变量> copied_sorted_long <- dput(sorted_long.SUB_dfCSKE_tot[193:224,]) structure(list(County = c("Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa", "Garissa"), `Period of measurement Kenya` = c("2011-01", "2011-04", "2011-07", "2011-10", "2012-01", "2012-04", "2012-07", "2012-10", "2013-01", "2013-04", "2013-07", "2013-10", "2014-01", "2014-04", "2014-07", "2014-10", "2015-01", "2015-04", "2015-07", "2015-10", "2016-02", "2016-06", "2016-10", "2017-02", "2017-06", "2017-10", "2018-02", "2018-06", "2018-10", "2018-12", "2019-02", "2019-06"), `IPC class` = c(2, 2, 3, 3, 2, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 2, 2, 2, 2, 2, 3, 3, 2, 3, 1, 1, 2, 2, 2 ), Forecast = c(3, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 2, 3, 1, 1, 2, 2)), row.names = c(7L, 54L, 101L, 148L, 195L, 242L, 289L, 336L, 383L, 430L, 477L, 524L, 571L, 618L, 665L, 712L, 759L, 806L, 853L, 900L, 947L, 994L, 1041L, 1088L, 1135L, 1182L, 1229L, 1276L, 1323L, 1370L, 1417L, 1464L ), class = "data.frame") ,其中包含Garissa数据(为使名称保持简单)。然后,如果我对您的理解正确,那么您想在发生实际过渡时计算出一个误报only。如果没有过渡,按照定义,就不会有错误的预测(或者我们不在乎这些情况)。在那种情况下,我认为这可以满足您的需求(中间的data部分和data1当然可以组合在一根长管中)。同样,为清楚起见,下面的summary数据帧与您通过data提供的Garissa子集相同。

dput

下面的逻辑是,我们首先创建过渡和预测的过渡。然后,当且仅当存在过渡时,如果预报不预测过渡,我们才将其分类为误报。所有其他情况都被归类为“无误”。您不一定需要使用data1 <- data %>% mutate(crisis = ifelse(`IPC class` %in% 3:5, 1, 0)) %>% arrange(County, `Period of measurement Kenya`) %>% group_by(County) %>% mutate(crisis_trans = (crisis - lag(crisis)) > 0, crisis_trans_f = (Forecast - lag(Forecast)) > 0, misforecast = case_when( crisis_trans & crisis_trans_f ~ FALSE, crisis_trans & !crisis_trans_f ~ TRUE, TRUE ~ FALSE )) summary <- data1 %>% group_by(County) %>% summarise(n_transitions = sum(crisis_trans, na.rm = TRUE), n_misforecast = sum(misforecast)) ,但我很喜欢它,因为很清楚了解发生了什么。

© www.soinside.com 2019 - 2024. All rights reserved.