使用r从dataPreparation包中使用whichAreBijection命令从所标识的数据框中自动删除双射列

问题描述 投票:0回答:1

我在执行简单任务时遇到麻烦:

说我使用这些库并具有此数据框:

library(tidyverse)
library(dataPreparation)

df <- data.frame(col1 = 1, col2 = rnorm(1e1), col3 = sample(c(1, 2), 1e1, replace = TRUE))
df$col4 <- df$col2
df$col5[df$col3 == 1] = "a"
df$col5[df$col3 == 2] = "b"
df$col6 = c("b","b","a","a","b","a","a","a","a","b")
df$col7 = "d"
df$col8 = c(3,3,5,5,3,5,5,5,5,3)
df$col9 = c("x","x","y","y","x","y","y","y","y","x")
df$col10 = c("p","p","q","p","q","q","p","p","q","q")
df$col11 = c(10.5,10.5,11.37,10.5,11.37,11.37,10.5,10.5,11.37,11.37)
df <- df %>% mutate_if(is.character,as.factor)

使用以下命令,我想从df中删除第4、5、7、8、9、11列。

whichAreBijection(df)
[1] "whichAreBijection: col7 is a bijection of col1. I put it in drop list."
[1] "whichAreBijection: col4 is a bijection of col2. I put it in drop list."
[1] "whichAreBijection: col5 is a bijection of col3. I put it in drop list."
[1] "whichAreBijection: col8 is a bijection of col6. I put it in drop list."
[1] "whichAreBijection: col9 is a bijection of col6. I put it in drop list."
[1] "whichAreBijection: col11 is a bijection of col10. I put it in drop list."
[1] "whichAreBijection: it took me 0.08s to identify 6 column(s) to drop."
[1]  4  5  7  8  9 11

我可以使用手动将其删除

df$col4 = NULL
df$col5 = NULL
df$col7 = NULL
df$col8 = NULL
df$col9 = NULL
df$col11 = NULL

但是,我希望算法自动执行此操作。

我首先尝试以下操作以生成包含由whichAreBijection提出的列号的数据帧m,然后最终将其从df中删除,但它在任何地方都没有帮助我:

x <- whichAreBijection(df)
y <- length(x)

m <- as.data.frame(matrix(0, ncol = y, nrow = nrow(df)))
i = 1
while (i< y+1) {
  # z <- names(df)[x[i]]
  m[,i] <- df[,x[i]]
  i<- i+1
}

上面生成的m具有由4、5、7、8、9、11给定的常数项

我看到使用类似这样的简单命令

m[,1] <- df[,4]

用df的第四列完全替换m的第一列。

我遇到的第二个麻烦是在m中使用了与df中相同的列名。听起来这很容易完成简单的任务。

  1. 为什么没有在m中精确替换列?

  2. 我如何自动让m选择要删除的df列名称作为列名称?

  3. 有没有更好的方法来避免这种混乱,并且可以直接删除whichAreBijection建议的列名?] >>

  4. 我在执行简单任务时遇到了麻烦:假设我使用这些库并具有以下数据框:library(tidyverse)library(dataPreparation)df

r dataframe data-cleaning identity-column bijection
1个回答
0
投票

我能够使用以下方法解决问题1:

x <- whichAreBijection(df)
y <- length(x)
m <- as.data.frame(matrix(0, ncol = y, nrow = nrow(df)))
i = 1
while (i< y+1) {
    m[,i] <- df[,x[i], with = FALSE]
    i<- i+1
}
© www.soinside.com 2019 - 2024. All rights reserved.