我在看这个数据集:https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data
我预处理了数据:
ca.1<-read.csv("CreditApproval.csv",T,",")
# From http://stackoverflow.com/q/4787332/
remove_outliers <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}
ca.1$A2<-remove_outliers(ca$A2)
ca.1$A3<-remove_outliers(ca$A3)
ca.1$A8<-remove_outliers(ca$A8)
ca.1$A11<-remove_outliers(ca$A11)
ca.1$A14<-remove_outliers(ca$A14)
ca.1$A15<-remove_outliers(ca$A15)
ca.1$A2<-discretize(ca.1$A2,"frequency",categories = 6)
ca.1$A3<-discretize(ca.1$A3,"frequency",categories = 6)
ca.1$A8<-discretize(ca.1$A8,"frequency",categories = 6)
ca.1$A11<-discretize(ca.1$A11,"frequency",categories = 6)
ca.1$A14<-discretize(ca.1$A14,"frequency",categories = 6)
ca.1$A15<-discretize(ca.1$A15,"frequency",categories = 6)
ca.1<-na.omit(ca.1)
经过微调支持,信心,min / maxlen我仍然得到65条规则:
> rules<-apriori(ca.1, parameter= list(supp=0.15, conf=0.89, minlen=3, maxlen=4), appearance=list(rhs=c("class=-", "class=+"), default="lhs"))
> rules.sorted <- sort(rules, by="lift")
> inspect(rules.sorted)
lhs rhs support confidence lift
[1] {A5=g,A9=t,A10=t} => {class=+} 0.1521739 0.8974359 2.770607
[2] {A4=u,A9=t,A10=t} => {class=+} 0.1521739 0.8974359 2.770607
[3] {A1=a,A9=f} => {class=-} 0.1717391 0.9753086 1.442579
[4] {A1=a,A9=f,A13=g} => {class=-} 0.1608696 0.9736842 1.440176
...[65]
你可以看到+
规则有更大的提升,但比-
规则更少的支持和信心。我一直在查看文档,并且无法通过提升找到任何限制参数。这可能吗?如果没有,你在这样的情况下做什么?
在arules包中定义了一个特殊的函数来子集这个对象类型。为了过滤提升值小于2的规则,您可以尝试以下方法:
subset(rules, subset = lift > 2)
如果您尝试了怎么办
apriori(df, parameter = list(lift = 0.3, minlen =2))
在这种情况下,您可以将最小升力设置为任何值,只需选择0.3。
我认为apriori功能不会取电梯作为参数之一。如果我尝试设置升力,我会收到此错误
错误:参数无效:提升
相反,我可以通过提升对规则进行排序,并根据提升值选择规则,如下所示
sort(规则,by =“lift”,减去= TRUE)
这不是一个简单的解决方案,而是一个体面的解决方案
你不能仅通过电梯来限制先验规则。你必须首先得到支持和信心的限制,你在这里做了:
rules<-apriori(ca.1, parameter= list(supp=0.15, conf=0.89, minlen=3, maxlen=4)
之后,做这样的事情
rulesLift <- sort(subset(rules, subset = lift < 2), by="lift")
inspect(rulesLift)