如何将行保留在分类数据中？

Question

我有一个动物收容所的数据集，其中参数“品种”具有50多种不同的可能性。我查看了数据汇总，共有4个品种占主导地位。我的问题是，如何创建仅包含这四个品种的数据集（并使所有其他参数保持相同）？

这里是我到目前为止尝试过的：（meow2是原始数据）

meow3 <- meow2[ which(meow2$breed1=="domestic shorthair" & "domestic mediumhair" & "domestic longhair" & "siamese"),]

一些在线研究建议我创建一个子集？这是我的尝试：

meow3 <- subset(meow2, breed1=="domestic shorthair" "domestic meduimhair" "domestic longhair" "siamese")

我确定有一些格式问题，但是我确实很难为此找到在线资源。我也尝试调查错误，但是似乎没有任何效果。

Answer 1

最简单的方法是使用%in%：

common_breeds <- c("domestic shorthair","domestic mediumhair",
          "domestic longhair", "siamese")   
meow3 <- subset(meow2, breed1 %in% common_breeds)

您可以做类似的事情

 ... breed1=="domestic shorthair" | breed1=="domestic mediumhair" |
     breed1=="domestic longhair" | breed1=="siamese" ...

（（您需要使用|（或）而不是&（和）...]

Answer 2

您可以使用dplyr和forcats，在本示例中，我将限制设置为1因子，根据需要更改您的数据

library(tidyverse)



testing <- data.frame(factors = factor(c(1,1,1,2,2,3)))


testing %>% 
  mutate(factors = factors %>% fct_lump_n(n = 1,other_level = "other")) %>% 
  filter(factors != "other")

如果需要冻结的特定列表，请使用fct_other函数代替

testing2 <-
  data.frame(factors = factor(
    c(
      "domestic shorthair",
      "domestic mediumhair",
      "domestic longhair",
      "siamese"
    )
  ))

testing2 %>% 
  mutate(factors = factors %>% fct_other(keep = c("domestic shorthair","domestic mediumhair"),other_level = "other")) %>%
  filter(factors != "other")

如何将行保留在分类数据中？

问题描述投票：-1回答：2

2个回答

最新问题

如何将行保留在分类数据中？

问题描述 投票：-1回答：2

2个回答

最新问题

问题描述投票：-1回答：2