我有0's
和1's
这个变量,我想将其转换为NO's
和YES's
。变量的名称为default
,数据集的名称为credit_train
。我尝试的最初解决方案无效,该解决方案是通过以下代码将integer
类变量default
设为factor
:credit_train$default <- factor(credit_train$default)
。这提供了从以下内容的过渡:
> class(credit_train$default)
[1] "integer"
> class(credit_train$default)
[1] "factor"
以下决策树算法需要该因子:
credit_model <- C5.0(credit_train[-1],credit_train$default)
但是,通过检查发现以下内容(树大小= 0):
> credit_model
Call:
C5.0.default(x = credit_train[-1], y = credit_train$default)
Classification Tree
Number of samples: 900
Number of predictors: 20
Tree size: 0
Non-standard options: attempt to group attributes
因此,我现在尝试将因素设置为是和否,因为1和0可能有问题。
我将在此处包括完整的代码(直到问题出现为止:]:>
install.packages("C50", dependencies=TRUE, repos='http://cran.rstudio.com/') library(C50) # Gives the decision tree algorithm #######Step 2: EXploring and Preparing the Data#### credit <- read.csv("german.csv") credit str(credit) table(credit$account_check_status) table(credit$savings) summary(credit$duration_in_month) summary(credit$credit_amount) # A successful model that identifies applicants who are at # high risk of default, allowing the bank to refuse the credit # request before the money is given. table(credit$default) # Data Preparation: Create RANDOM training and test datasets # Use 90% data for training & 10% data for testing # B/C its not RANDOM (bank sorted data by loan amount, largest # at end of the file & so train only on the smallest loans) set.seed(123) # select 900 values at random out of the sequence of integers # of 1 to 1,000 train_sample <- sample(1000,900) # Shows the random selection str(train_sample) # The 'train_sample'(900) is passed as selected rows. credit_train <- credit[train_sample,] # The REMAINING rows NOT passed (100) become the test credit_test <- credit[-train_sample,] # Check to see if randomization was done correctly by having # 30 percent of loans with default in each of the datasets prop.table(table(credit_train$default)) prop.table(table(credit_test$default)) #####STEP3: Training a model on the Data ###### credit_model <- C5.0(credit_train[-1],credit_train$default)
这里是数据集:
我有一个变量,该变量带有0和1,我想将其转换为NO和YES。变量的名称是默认名称,数据集的名称是credit_train。我尝试的初始解决方案是...
这应该做,