子集化后未使用的因子水平没有下降？

Question

我有一个变量df1$StudyAreaVisitNote，我把它变成了一个因素。但是当我将df1子集化到BS时，这个变量并不是一个因素：在子集化数据上使用table（）函数会显示如果table()在原始数据上运行应该返回的结果？

为什么会这样？

我找到的两个解决方法是：

导出子集化数据并重新导入
在子集化之后，再次将列指定为因子

码：

# My dataset can be found here: http://textuploader.com/9tx5  (I'm sure there's a better way to host it, but I'm new, sorry!)
# Load Initial Dataset (df1)
df1 <- read.csv("/Users/user/Desktop/untitled folder/pre_subset.csv", header=TRUE,sep=",")

# Make both columns factors
df1$Trap.Type <- factor(df1$Trap.Type)
df1$StudyAreaVisitNote <-factor(df1$StudyAreaVisitNote)

# Subset out site of interest
BS <- subset(df1, Trap.Type=="HR-BA-BS")

# Export to Excel, save as CSV after it's in excel
library(WriteXLS)
 WriteXLS("BS", ExcelFileName = "/Users/user/Desktop/test.xlsx", col.names = TRUE, AdjWidth = TRUE, BoldHeaderRow = TRUE, FreezeRow = 1)


# Load second Dataset (df2)
df2 <- read.csv("/Users/user/Desktop/untitled folder/post_subset.csv", header=TRUE, sep=",")

# both datasets should be identical, and they are superficially, but...
# Have a look at df2
summary(df2$StudyAreaVisitNote)  # Looks good, only counts levels that are present

# Now, look at BS from df1
summary(BS$StudyAreaVisitNote)  # sessions not present in the subsetted data (but present in df1?) are included???

# Make BS$StudyAreaVisitNote a factor...Again??
BS$StudyAreaVisitNote <- factor(BS$StudyAreaVisitNote)

# Try line 31 again
summary(BS$StudyAreaVisitNote) # this time it works, why is factor not carried through during subset?

Answer 1

即使在子集化之后，因子仍然是一个因素。我敢肯定class(BS$StudyAreaVisitNote)=="factor"。但是，因素不会自动降低未使用的水平。当你做这样的事情时，这可能会有所帮助

set.seed(16)
dd<-data.frame(
    gender=sample(c("M","F"), 25, replace=T),
    age=rpois(25, 20)
)
dd

table(subset(dd, age<15)$gender)
# F M 
# 0 3

这里的因素记住它有M和F，即使子集没有任何F，仍然保留水平。如果你想摆脱未使用的水平，你可以明确地调用droplevels()。

table(droplevels(subset(dd, age<15))$gender)
# M
# 3

（现在它忘记了F'）

因此，不要使用summary，而是在两个data.frames上比较table的结果。

子集化后未使用的因子水平没有下降？

问题描述投票：0回答：1

1个回答

最新问题

子集化后未使用的因子水平没有下降？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1