我有一个带有参考位置的表格,比如 x 是起点,y 是终点。
|---------------------|------------------|
| x | y |
|---------------------|------------------|
| 10 | 35 |
|---------------------|------------------|
| 58 | 89 |
|---------------------|------------------|
然后我有另一个具有单个位置的表,我的目标是检查第二个表中的任何位置是否在第一个表中,考虑到第二个表中的位置可以位于 col1 和 col2 之间。
|---------------------|
| 12 |
|---------------------|
| 27 |
|---------------------|
| 65 |
|---------------------|
我如何检查这一点,因为我无法使用 dplyr 中的任何 joins,甚至无法使用 unique。
我们可以使用
foverlaps
中的
data.table
library(data.table)
df1 <- data.frame(x = c(10, 58), y = c(35, 89))
df2 <- data.frame(x= c(12, 27, 65), y = c(12, 27, 65))
setDT(df1, key = c('x', 'y'))
setDT(df2, key = c('x', 'y'))
foverlaps(df2, df1, type = "within", which = TRUE)$yid
#[1] 1 1 2
data.table
的1.9.8版本(CRAN 2016年11月25日)引入了非等值连接,可以用来代替foverlaps()
:
setDT(df1)[setDT(df2), on = .(x <= z, y >= z), which = TRUE]
[1] 1 1 2 NA
请注意,第二个表与 OP 的数据不同,因为添加了第四行,该行与任何间隔都不匹配。
df1 <- data.frame(x = c(10, 58), y = c(35, 89))
df2 <- data.frame(z = c(12, 27, 65, 90))
diffdf::diffdf
:
#' you have: `df.a` `df.b`
#' simple diff by row number
df.a %>% diffdf::diffdf (df.b)
#' order then diff by row number
data.table::setorderv (df.a) %>% diffdf::diffdf (data.table::setorderv (df.b))
#' diff by key(s) you given
#' the concat of key1, key2, ... should be The Rowkey of both your dataframes.
df.a %>% diffdf::diffdf (df.b, key = c('key1','key2'))
并且,这是一个允许进行多 rds 文件差异的工具:
rdses.compare =
\ (orderf = \ (a) a)
\ (dirpath.a, dirpath.b)
\ (keys) (\ (.sep)
dirpath.a %>% base::c (dirpath.b) %>% base::`names<-` (.,.) %>%
base::lapply (\ (p) p %>% base::list.files (full.names = T)) %>%
base::Reduce (\ (a,b) a %>% base::paste (b, sep = .sep), x = .) %>%
base::`names<-` (.,.) %>%
base::strsplit (.sep) %>%
future.apply::future_lapply (\ (x) x %>%
base::`names<-` (.,.) %>%
base::lapply (base::readRDS) %>%
base::lapply (orderf) %>%
base::Reduce (\ (a,b) a %>%
diffdf::diffdf (b, keys = keys), x = .) %>%
{.}) %>%
{.}) (" <> ") %>%
{.} ;
rdsdirs.compare =
\ (orderf = data.table::setorderv)
\ (path.a, path.b)
\ (dir) (\ (`%rdses.compare%`)
path.a %>% base::c (path.b) %>%
file.path (dir) %>%
{.[1] %rdses.compare% .[2]}
) (rdses.compare (orderf)) ;
`%rdses.compare%` = rdses.compare (\ (a) a)
`%rdses.compare.ord%` = rdses.compare (data.table::setorderv)
`%rdsdirs.compare%` = rdsdirs.compare (\ (a) a)
`%rdsdirs.compare.ord%` = rdsdirs.compare (data.table::setorderv)
#' Parallel run setting
future::plan (future::multisession)
#' Compare two dir witch both have same count and name of RDS files
(dir.a %rdses.compare% dir.b) (keys) -> res
#' Compare two same dir at two different path:
(path.a %rdsdirs.compare% path.b) ('player_one') (keys) -> res
#' `keys` can be `c('key1','key2',...)` or `NULL`
#'
#' Then you can filt all no-issue report
res %<>% base::Filter (\ (i) base::length (i) > 0, x = .)
#' GC if you need
future::plan (future::sequential); base::gc ();
该工具需要您确保两个目录中的 rds 文件具有相同(或至少可能)的名称,并且这些目录必须仅包含 rds 文件。