除非另外说明,否则在加载数据时,R会将字符串转换为因子。然后,我们必须根据基础数据将因子转换为字符或数字。对于数字值,我们首先使用as.character()转换为字符串,然后对于整数值将结果转换为as.integer()。
但是使用gsub从数字中清除字符后,R会自动将清除后的字符串转换为字符。
例如:
> sal <- data.frame(name = c('abc','def','ghi','pqr'),
+ Salary = c('$65,000','$102,000','$85,000','$72,000'))
> str(sal)
'data.frame': 4 obs. of 2 variables:
$ name : Factor w/ 4 levels "abc","def","ghi",..: 1 2 3 4
$ Salary: Factor w/ 4 levels "$102,000","$65,000",..: 2 1 4 3
> sal$Salary <- gsub('\\$','',sal$Salary)
> sal$Salary <- gsub(',','',sal$Salary)
> str(sal)
'data.frame': 4 obs. of 2 variables:
$ name : Factor w/ 4 levels "abc","def","ghi",..: 1 2 3 4
$ Salary: chr "65000" "102000" "85000" "72000"
>
我们可以看到gsub之后,“工资”列从“系数”变为“字符”。有人可以告诉我gsub在这里是否还执行as.character()操作吗?如果是这样,它将不会将列转换为整数,因为所有值都是整数?
是,gsub
执行as.character
。如果在控制台中键入gsub
,则可以看到功能
function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
{
if (!is.character(x))
x <- as.character(x)
.Internal(gsub(as.character(pattern), as.character(replacement),
x, ignore.case, perl, fixed, useBytes))
}
并且不,它不会直接转换为整数,因为它总是返回一个字符向量。来自?gsub
sub和gsub返回与x具有相同长度和相同属性的字符向量(可能会强制转换为字符)。
您可以直接更改因子的水平,即字符:
sal <- data.frame(name = c('abc','def','ghi','pqr'),
Salary = c('$65,000','$102,000','$85,000','$72,000'))
levels(sal$Salary) <- gsub('\\$|,', '', levels(sal$Salary))
str(sal)
> 'data.frame': 4 obs. of 2 variables:
$ name : Factor w/ 4 levels "abc","def","ghi",..: 1 2 3 4
$ Salary: Factor w/ 4 levels "102000","65000",..: 2 1 4 3