R - 用附加的 "值之间 "条件合并两列的数据框?

问题描述 投票:0回答:1

我有两个数据框,如下所示。

数据框A:

code1 code2 element1 from to
c1a    c2a     e1a     1    15
c1a    c2a     e1b     17   50
c1a    c2b     e1c     14   67
c1b    c2c     e1d     1    20
c1b    c2d     e1e    40   60

数据框B

code1 code2 element2 number
c1a    c2a     e2a    7
c1a    c2a     e2b    10
c1a    c2a     e2c    35

我基本上需要把它们连接起来,如果 from =< number <= to,以获得类似的东西。

RESULT DATAFRAME

(碎片,我的模拟数据不够。我想把这个合并为两个 充分 数据帧A和B)。)

code1 code2 element1 element2 from   to   number
c1a    c2a     e1a    e2a      1     15     7
c1a    c2a     e1a    e2b      1     15     10
c1a    c2a     e1b    e2c      17    50     35

我可以用for循环来做,并手动检查,但我想知道是否有更 "优雅 "的方法来做这件事?

r dataframe merge
1个回答
2
投票

你可以加入数据,然后 filter 在范围内的值。

您可以在 dplyr

library(dplyr)
left_join(B, A, by = c('code1', 'code2')) %>% 
    filter(number >= from & number <= to)

#  code1 code2 element2 number element1 from to
#1   c1a   c2a      e2a      7      e1a    1 15
#2   c1a   c2a      e2b     10      e1a    1 15
#3   c1a   c2a      e2c     35      e1b   17 50

或者在基座R中。

subset(merge(B, A, by = c('code1', 'code2')), number >= from & number <= to)

1
投票

这里有一个使用 fuzzyjoin::fuzzy_inner_join.我从你的输出中了解到,除了标准的 from =< number <= to如果您想加入 code1code2.

  1. 加入 code1code2 以平等
  2. 加入 fromnumber 由第一个不等式,即 from <= number
  3. 加入 tonumber 由第二个不等式,即。number <= to

的事情与 fuzzy_join 是他们在两个数据帧中输出所有列。

-

library(fuzzyjoin)
fuzzy_inner_join(
      df_A, df_B,
      by = c(
        "code1" = "code1",
        "code2" = "code2",
        "from" = "number",
        "to" = "number"),
      match_fun = c(
        "code1" = function(l, r) l == r,
        "code2" = function(l, r) l == r,
        "from" = function(l, r) l <= r,
        "to" = function(l, r) r <= l))

# code1.x code2.x element1 from to code1.y code2.y element2 number
# 1     c1a     c2a      e1a    1 15     c1a     c2a      e2a      7
# 2     c1a     c2a      e1a    1 15     c1a     c2a      e2b     10
# 3     c1a     c2a      e1b   17 50     c1a     c2a      e2c     35

资料

df_A <- structure(list(code1 = c("c1a", "c1a", "c1a", "c1b", "c1b"), 
    code2 = c("c2a", "c2a", "c2b", "c2c", "c2d"), element1 = c("e1a", 
    "e1b", "e1c", "e1d", "e1e"), from = c(1L, 17L, 14L, 1L, 40L
    ), to = c(15L, 50L, 67L, 20L, 60L)), class = "data.frame", row.names = c(NA, -5L))

df_B <- structure(list(code1 = c("c1a", "c1a", "c1a"), code2 = c("c2a", 
"c2a", "c2a"), element2 = c("e2a", "e2b", "e2c"), number = c(7L, 
10L, 35L)), class = "data.frame", row.names = c(NA, -3L))
热门问题
推荐问题
最新问题