直接使用变量进行过滤有什么区别?

问题描述 投票:0回答:1

R 做出了这种奇怪的行为:

country=2
> Polity_data %>% filter(as.integer(ccode) == country) %>% head()
# A tibble: 0 × 37
# ℹ 37 variables: p5 <dbl>, cyear <dbl>, ccode <dbl>, scode <chr>, country <chr>, year <dbl>, flag <dbl>,
#   fragment <dbl>, democ <dbl>, autoc <dbl>, polity <dbl>, polity2 <dbl>, durable <dbl>, xrreg <dbl>,
#   xrcomp <dbl>, xropen <dbl>, xconst <dbl>, parreg <dbl>, parcomp <dbl>, exrec <dbl>, exconst <dbl>,
#   polcomp <dbl>, prior <dbl>, emonth <dbl>, eday <dbl>, eyear <dbl>, eprec <dbl>, interim <dbl>, bmonth <dbl>,
#   bday <dbl>, byear <dbl>, bprec <dbl>, post <dbl>, change <dbl>, d5 <dbl>, sf <dbl>, regtrans <dbl>

但是直接使用国家代码时,它不为空,这是正确的。

> Polity_data %>% filter(as.integer(ccode) == 2) %>% head()
# A tibble: 6 × 37
     p5 cyear ccode scode country  year  flag fragment democ autoc polity polity2 durable xrreg xrcomp xropen xconst
  <dbl> <dbl> <dbl> <chr> <chr>   <dbl> <dbl>    <dbl> <dbl> <dbl>  <dbl>   <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1     1 21776     2 USA   United…  1776     0        0   -77   -77    -77       0       0   -77    -77    -77    -77
2     1 21777     2 USA   United…  1777     0        0   -77   -77    -77       0       0   -77    -77    -77    -77
3     1 21778     2 USA   United…  1778     0        0   -77   -77    -77       0       0   -77    -77    -77    -77
4     1 21779     2 USA   United…  1779     0        0   -77   -77    -77       0       0   -77    -77    -77    -77
5     1 21780     2 USA   United…  1780     0        0   -77   -77    -77       0       0   -77    -77    -77    -77
6     1 21781     2 USA   United…  1781     0        0   -77   -77    -77       0       0   -77    -77    -77    -77
# ℹ 20 more variables: parreg <dbl>, parcomp <dbl>, exrec <dbl>, exconst <dbl>, polcomp <dbl>, prior <dbl>,
#   emonth <dbl>, eday <dbl>, eyear <dbl>, eprec <dbl>, interim <dbl>, bmonth <dbl>, bday <dbl>, byear <dbl>,
#   bprec <dbl>, post <dbl>, change <dbl>, d5 <dbl>, sf <dbl>, regtrans <dbl>

`

搜索 NA 没有帮助:

Polity_data %>% 
+     mutate(ccode_is_numeric = !is.na(as.numeric(as.character(ccode)))) %>%
+     filter(ccode_is_numeric == FALSE)
# A tibble: 0 × 38
# ℹ 38 variables: p5 <dbl>, cyear <dbl>, ccode <dbl>, scode <chr>, country <chr>, year <dbl>, flag <dbl>,
#   fragment <dbl>, democ <dbl>, autoc <dbl>, polity <dbl>, polity2 <dbl>, durable <dbl>, xrreg <dbl>,
#   xrcomp <dbl>, xropen <dbl>, xconst <dbl>, parreg <dbl>, parcomp <dbl>, exrec <dbl>, exconst <dbl>,
#   polcomp <dbl>, prior <dbl>, emonth <dbl>, eday <dbl>, eyear <dbl>, eprec <dbl>, interim <dbl>, bmonth <dbl>,
#   bday <dbl>, byear <dbl>, bprec <dbl>, post <dbl>, change <dbl>, d5 <dbl>, sf <dbl>, regtrans <dbl>,
#   ccode_is_numeric <lgl>

数据类型也相同:

> typeof(country)
[1] "double"
> typeof(Polity_data$ccode)
[1] "double"

但是,直接比较是有效的:

> typeof(sort(unique(Polity_data$ccode)))
[1] "double"
> country
[1] 2
> sort(unique(Polity_data$ccode))[1]
[1] 2
> sort(unique(Polity_data$ccode))[1]==2
[1] TRUE
> sort(unique(Polity_data$ccode))[1]==country
[1] TRUE

有什么想法吗?我不是很熟练的程序员,但现在我真的很困惑......

dplyr filter
1个回答
0
投票

原因是整数后面应该跟L。 所以尝试 2L 而不是 2,或者使用“as.numeric”。 但2L应该可以帮你解决这个问题

© www.soinside.com 2019 - 2024. All rights reserved.