比较 r 中的两个表

Question

我有一个带有参考位置的表格，比如 x 是起点，y 是终点。

|---------------------|------------------|
|           x         |        y         |
|---------------------|------------------|
|          10         |         35       |
|---------------------|------------------|
|          58         |         89       |
|---------------------|------------------|

然后我有另一个具有单个位置的表，我的目标是检查第二个表中的任何位置是否在第一个表中，考虑到第二个表中的位置可以位于 col1 和 col2 之间。

|---------------------|
|          12         |     
|---------------------|
|          27         |       
|---------------------|
|          65         |
|---------------------|

我如何检查这一点，因为我无法使用 dplyr 中的任何 joins，甚至无法使用 unique。

Answer 1

我们可以使用

foverlaps

 中的

data.table

library(data.table)
df1 <- data.frame(x = c(10, 58), y = c(35, 89))
df2 <- data.frame(x= c(12, 27, 65), y = c(12, 27, 65))
setDT(df1, key = c('x', 'y'))
setDT(df2, key = c('x', 'y'))
foverlaps(df2, df1, type = "within", which = TRUE)$yid 
#[1] 1 1 2

Answer 2

data.table

的1.9.8版本（CRAN 2016年11月25日）引入了非等值连接，可以用来代替

foverlaps()

：

setDT(df1)[setDT(df2), on = .(x <= z, y >= z), which = TRUE]

[1]  1  1  2 NA

请注意，第二个表与 OP 的数据不同，因为添加了第四行，该行与任何间隔都不匹配。

数据

df1 <- data.frame(x = c(10, 58), y = c(35, 89))
df2 <- data.frame(z = c(12, 27, 65, 90))

Answer 3

您可以使用

diffdf::diffdf

:

#' you have: `df.a` `df.b`

#' simple diff by row number
df.a %>% diffdf::diffdf (df.b)

#' order then diff by row number
data.table::setorderv (df.a) %>% diffdf::diffdf (data.table::setorderv (df.b))

#' diff by key(s) you given
#' the concat of key1, key2, ... should be The Rowkey of both your dataframes.
df.a %>% diffdf::diffdf (df.b, key = c('key1','key2'))

并且，这是一个允许进行多 rds 文件差异的工具：

rdses.compare = 
\ (orderf = \ (a) a) 
\ (dirpath.a, dirpath.b) 
\ (keys) (\ (.sep) 
    dirpath.a %>% base::c (dirpath.b) %>% base::`names<-` (.,.) %>% 
        base::lapply (\ (p) p %>% base::list.files (full.names = T)) %>% 
        base::Reduce (\ (a,b) a %>% base::paste (b, sep = .sep), x = .) %>% 
        base::`names<-` (.,.) %>% 
        base::strsplit (.sep) %>% 
        future.apply::future_lapply (\ (x) x %>% 
            base::`names<-` (.,.) %>% 
            base::lapply (base::readRDS) %>% 
            base::lapply (orderf) %>% 
            base::Reduce (\ (a,b) a %>% 
                diffdf::diffdf (b, keys = keys), x = .) %>% 
            {.}) %>% 
        {.}) (" <> ") %>% 
    {.} ;

rdsdirs.compare = 
\ (orderf = data.table::setorderv) 
\ (path.a, path.b) 
\ (dir) (\ (`%rdses.compare%`) 
    path.a %>% base::c (path.b) %>% 
        file.path (dir) %>% 
        {.[1] %rdses.compare% .[2]}
    ) (rdses.compare (orderf)) ;

`%rdses.compare%` = rdses.compare (\ (a) a)
`%rdses.compare.ord%` = rdses.compare (data.table::setorderv)

`%rdsdirs.compare%` = rdsdirs.compare (\ (a) a)
`%rdsdirs.compare.ord%` = rdsdirs.compare (data.table::setorderv)

#' Parallel run setting
future::plan (future::multisession)

#' Compare two dir witch both have same count and name of RDS files
(dir.a %rdses.compare% dir.b) (keys) -> res

#' Compare two same dir at two different path: 
(path.a %rdsdirs.compare% path.b) ('player_one') (keys) -> res

#' `keys` can be `c('key1','key2',...)` or `NULL`
#' 

#' Then you can filt all no-issue report
res %<>% base::Filter (\ (i) base::length (i) > 0, x = .)

#' GC if you need
future::plan (future::sequential); base::gc ();

该工具需要您确保两个目录中的 rds 文件具有相同（或至少可能）的名称，并且这些目录必须仅包含 rds 文件。

比较 r 中的两个表

问题描述投票：0回答：3

3个回答

数据

最新问题

比较 r 中的两个表

问题描述 投票：0回答：3

3个回答

数据

最新问题

问题描述投票：0回答：3